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Abstract. We provide a decidable characterization of regular forest languages definable 
in FO^(<v, <h)- By FO^(<v, <h) we refer to the two variable fragment of first order 
logic built from the descendant relation and the following sibling relation. In terms of 
expressive power it corresponds to a fragment of the navigational core of XPath that 
contains modalities for going up to some ancestor, down to some descendant, left to some 
preceding sibling, and right to some following sibling. 

We also show that our techniques can be applied to other two variable first-order logics 
having exactly the same vertical modalities as FO^(<v, <h) but having different horizontal 
modalities. 


1. Introduction 

Logics for expressing properties of labeled trees and forests figure importantly in several 
different areas of Computer Science. This paper is about logics on finite unranked trees. 
The most prominent one is monadic second-order logic (MSO) as it can be captured by 
finite tree automata. All the logics we consider are less expressive than monadic second- 
order logic. Even with these restrictions, this encompasses a large body of important logics, 
such as variants of first-order logic, temporal logics including CTL* or CTL, as well as query 
languages used for XML data. 

This paper is part of a research program devoted to understanding and comparing the 
expressive power of such logics. 

We say that a logic has a decidable characterization if the following problem is decid¬ 
able: given as input a finite tree automaton (or equivalently a formula of MSO), decide if 
the recognized language is definable by the logic in question. Usually a decidable character¬ 
ization requires a solid understanding of the expressive power of the corresponding logic as 
witnessed by decades of research, especially for logics for strings. The main open problem 
in this research program is to find a decidable characterization of FO(<v), the first-order 
logic using a binary predicate <v for the ancestor relation. 

ACM CCS: [Theory of computation]: Logic; Formal languages and automata theory—Tree 
languages. 

Key words and phrases: Tree Languages,Tree Automata, Two-Variables First-Order Logic, 
Characterization. 
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In this paper we work with unranked ordered trees and by FO(<v, <h) we refer to the 
logic that has two binary predicates, one for the descendant relation, one for the following 
sibling relation. We investigate an important fragment of FO(<v, <h)) hs two variable 
restriction denoted FO^(<v, <h)- This is a robust formalism that, in terms of expressive 
power, has an equivalent counterpart in temporal logic. This temporal counterpart can 
be seen as the fragment of the navigational core of XPath that does not use the successor 
axis |Mar05| . More precisely, it corresponds to the temporal logic EF + F^^(F|j, Fj^^) that 
navigates in the tree using two “vertical” modalities, one for going to some ancestor node 
(F“^) and one for going to some descendant node (EF), and two “horizontal” modalities 
for going to some following sibling (Fj^) or some preceding sibling (Fj^^). 

We provide a characterization of FO^(<v, <h)) or equivalently EF + F“^(Fij, F^"^), 
over unranked ordered trees. We also show that this characterization is decidable. Since 
FO^(<v, <h) can express the fact that a tree has rank k for any fix number k, our result 
also applies to ranked trees. 

Our characterization is stated using closure properties expressed partly using identities 
that must be satisfied by the syntactic forest algebra of the input regular language, and 
partly via a mechanism that we call saturation. 

Here, a forest algebra is essentially a pair of finite semigroups, the “horizontal” semi¬ 
group for forest types and the “vertical” semigroup for context types, together with an 
action of contexts over forests. It was introduced in |BW07] . using monoids instead of 
semigroups, and is a formalism for recognizing forest languages whose expressive power is 
equivalent to definability in MSO. Given a formula of MSO, one can compute its syntactic 
forest algebra, which recognizes the set of forests satisfying the formula. Hence any charac¬ 
terization based on a finite set of identities over the syntactic forest algebra can be tested 
effectively when given a regular language as long as each identity can be effectively tested, 
which will always be the case in this paper. 

The syntactic forest algebra was used successfully for obtaining decidable characteriza¬ 
tions for the classes of tree languages definable in EF -|- EX“^ |BW06] . EF -kF ^ |Boj09| , 
BSi(<v) |BSSI2] and A 2 (<v) [B^ . Here EF + EX"^ is the class of languages definable 
in a temporal logic that navigates in trees using two vertical modalities, EF, that we have 
already seen before, and EX, which goes to a child of the current node. EF -|- F~^ is the 
class of languages definable in EF -|- F~^(Fjj, Fh~^) without using the horizontal modalities. 
BSi(<v) stands for the class of languages definable by a Boolean combination of existential 
formulas of FO(<v) and A2(<v) is the class of languages definable in FO(<v) by both a 
formula of the form 3*V* and a formula of the form V*3*. 

Over strings, the logics induced by A 2 (<), FO^(<) and F-|- F“^, have exactly the 
same expressive power |EVW02l. ITW98] . But over trees this is not the case. For instance 
EF -k F~^ is closed under bisimulation while the other two are not. While decidable char¬ 
acterizations were obtained for EF -k F“^ and A 2 (<v) |Boj09 lBS10| . the important case 
of FO^(<v, <h) was still missing and is solved in this paper. 

Over strings, a regular language is definable in FO^(<) iff its syntactic semigroup 
satisfies an identity that can be effectively tested |TW98| . Not surprisingly our first set of 
identities requires that the horizontal and vertical semigroups of the syntactic forest algebra 
both satisfy this identity. Our extra property is more complex and mixes at the same time 
the vertical and horizontal navigational power of FO^(<v, <h)- We call it closure under 
saturation. 
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It is immediate from the string case that being definable in FO <h) implies that 
the vertical and horizontal semigroups of the syntactic forest algebra satisfy the required 
identity. That closure under saturation is also necessary is proved via a classical, but 
tedious, Ehrenfeucht-Frai’sse game argument. As usual in this area, the difficulty is to show 
that the closure conditions are sufficient. In order to do so, as it is standard when dealing 
with FO^(<v, <h) (see e.g. |Boj09[ iBSlOi ITW98| 1. we introduce Green-like relations for 
comparing elements of the syntactic algebra. However, in our case, we parametrize these 
relations with a set of forbidden patterns: the contexts authorized for going from one type 
to another type cannot use any of the forbidden pattern. We are then able to perform an 
induction using this set of forbidden patterns, thus refining our comparison relations more 
and more until they become trivial. 

Our proof has many similarities with the one of Bojahczyk that provides a decidable 
characterization for the logic EF -|- |Boj09| and we reuse several ideas developed in this 

paper. However it departs from it in many essential ways. First of all the closure under 
bisimulation of EF + F“^ was used in |Boj09| in an essential way in order to compute a 
subalgebra and perform inductions on the size of the algebra. Moreover, because EF -|- F“^ 
does not have horizontal navigation, Bojahczyk was able to isolate certain labels and then 
also perform inductions on the size of the alphabet. It is the combination of the induction 
on the size of the alphabet and on the size of the algebra that gave an elegant proof of 
the correctness of the identities for EF -|-F“^ given in |Boj09| . The logic FO^(<v, <h) is 
no longer closed under bisimulation and we were not able to perform an induction on the 
algebra. Moreover because our logic has horizontal navigation, it is no longer possible to 
isolate the label of a node from the labels of its siblings, hence it is no longer possible to 
perform an induction on the size of the alphabet. In order to overcome these problems 
our proof replaces the inductions used in |Boj09| by an induction on the set of forbidden 
patterns. This makes the two proofs technically fairly different. 

It turns out that our proof technique applies to various horizontal modalities. In the fi¬ 
nal section of the paper we show how to adapt the characterization obtained for FO (<V) <h) 


in order to obtain characterizations for EF -|-F (Xij,Fji,Xij ,Fij ), EF + F'^(S) and 
EF + F-^(S^), where Xj,, Xh\ S and are horizontal navigational modalities moving 
respectively to the next sibling, previous sibling, an arbitrary sibling including the current 
node, or an arbitrary different sibling excluding the current node. 


Other related work. Our characterization is essentially given using forest algebras. There 
exist several other formalisms that were used for prov iding characterizations of logical frag¬ 
ments of MSO (see e.g. |BS09[ IPSlIl IWil96[ EWOhj l. It is not clear however how to use 
these formalisms in order to provide a characterization of FO^ (<V) <h)' 

There exist decidable characterizations of EF -|- F“^ and A 2 (<v) over trees of bounded 
rank m a 0^ . But, as these logics cannot express the fact that a tree is binary, the unranked 
and bounded rank characterizations are different. As mentioned above, we don’t have this 
problem with FO^(<v, <h)- 


Organization of the paper. We first provide the preliminary definitions in Section [2l The 
main dehnitions and their basic properties are described in Section [5l Our characterization is 
stated in Section[6l That our properties are necessary for being definable in FO^(<v, <h) is 
proved in Section [71 We give the proof that our characterization for FO^(<v, <h) is sufficient 
in Section [8l Decidability of closure under saturation is not immediate and Section [9] is 
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devoted to this issue. In Section m we show how to adapt the arguments in order to 
characterize several other horizontal navigation modalities. 

Note that Section [71 Section [8] and Section [9] can be read in an arbitrary order. 

This paper is the journal version of [PSlOj . From the conference version the statement 
of the characterization has been slightly changed and the proofs have been significantly 
modified in order to simplify the presentation. 

2. Preliminaries 

We work with finite unranked ordered trees and forests labeled over a hnite alphabet A. A 
finite alphabet is a pair A = {A, B) where A and B are hnite sets of labels. We use A to 
label leaves and B to label inner nodes. Making the distinction between leaves and inner 
nodes labels makes our presentation slightly simpler without harming the generality of our 
results. Given a hnite alphabet A = {A,B), trees and forests are dehned inductively as 
follows: for any a £ A, a is a tree. If A, • • • , tfc is a hnite non-empty sequence of trees then 
ti + • • • + tfc is a forest. If s is a forest and b £ B, then b(s) is a tree. Notice that we do not 
consider empty trees nor empty forests. A set of trees (forests) over a hnite alphabet A is 
called a tree language (forest language). 

We use standard terminology for trees and forests dehning nodes, ancestors, descen¬ 
dants, following and preceding siblings. We write x <y y to say that x is a strict ancestor 
of y or, equivalently, that y is a strict descendant of x. We write x <ij y to say that x is a 
strict preceding sibling of y or, equivalently, that y is a strict following sibling of x. 

A context is a forest over {A U {Dljil) with a single leaf of label □ that cannot be 
a root and that has no sibling. This distinguished node is called the port of the context 
(see Figured]). This dehnition is not standard as usually contexts are dehned without the 
“no sibling” restriction but it is important for this paper to work with this non-standard 
dehnition. If c is a context, the path in c containing all the ancestors of its port is called 
the backbone of c. 

A context c can be composed with another context c! or with a forest s in the obvious 
way by substituting d or s in place of the port of c. This composition yields either the 
context cd or the forest cs. 


& 



Context Not a Context 


Figure 1: Illustration of the notion of context. The squared nodes represent ports. The 
right part is not a context because the port has a sibling. 

If X is a node of a forest then the subtree at x is the tree rooted at x. The subforest of 
X is the forest consisting of all the subtrees that are rooted at siblings of x (including x). 
Finally, if x is not a leaf, the subforest below x is the forest consisting of all the subtrees 
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that are rooted at children of x, see Figure El Notice that from the definitions if follows 
that s is a subforest of a forest t iff there exists a context c such that t = cs. In particular 
if we consider the forest b{ai + 02 + “s), oi + 02 + as is a subforest but not ai + 02 - 




© ^ ^ 
Subtree at x 





Subforest below x 


Figure 2: Illustration of the notion of subtrees and subforests 


3. The Logic FO^(<v,<h) 

3.1. Definition. A forest can be seen as a relational structure. The domain of the structure 
is the set of nodes. The signature contains a unary predicate Pa for each symbol a € A 
plus the binary predicates <v and <jj. By MSO(<v, <h) we denote the monadic second 
order logic over this relational signature. We use the classical semantics for MSO(<v, <h) 
and write s \= 4‘{u) if the formula cj) is true on s when interpreting its free variables with 
the corresponding nodes of u. As usual, each sentence f of MSO(<v, <h) defines a forest 
language = {s | s |= (/?}. A language defined in MSO(<v, <h) is called a regular language. 
As usual regular languages form a robust class of languages and there is a matching notion 
of unranked ordered forests automata (see for instance [CDG"*"! chapter 8]). We will see in 
Section 0] a corresponding notion of recognizability using forest algebras. 

The logic of interest for this paper is FO^(<v, <h)) the two variable restriction of the 
first-order fragment of MSO(<v, <h)- 

In terms of expressive power, FO^(<v, <h) is equivalent to EF -|- a tem¬ 

poral logic that we now describe. Essentially, EF -|-F~^(Fjj, Fjj"^) is the restriction of 
the navigational core of XPath without the child, parent, next-sibling and previous- 
sibling predicates. It is defined using the following grammar: 

ip K \ ipy if \ (f f\ip \ —lip I EFy? I V~^p I Fjj(/7 I 

We use the classical semantics for this logic which defines when a formula holds at a node 
X of a forest s. In particular, EF</? holds at x if there is a strict descendant of x where p 
holds, ¥~^p holds at x if there is a strict ancestor of x where p holds, F^y? holds at x if 
p holds at some strict following sibling of x, and holds at x if (/? holds at some strict 

preceding sibling of x. We then say that a forest s satisfies a formula if holds at the 
root of the first tree of s. The following result is immediate from [MarOh] : 
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Theorem 3.1. For any FO^(<v, <h) formula there exists a EF + F-i(Fh,Fh')/or- 

mula ip holding true on the same set of nodes for every forests. In particular a forest 
language is definable in FO^(<v, <h) iff it is definable in EF + F^i(Fh,Fhi). 

We aim at providing a decidable characterization of regular forest languages definable in 
FO^(<v, <h)- This means finding an algorithm that decides whether or not a given regular 
forest language is definable in FO (<V5 <h)- 

Note that FO^(<v, <h) is expressive enough to test whether a forest is a tree and, for 
each k whether it has rank k. Hence any result concerning forest languages definable in 
FO^(<v, <h) also applies to tree languages definable in FO^(<v, <h) and covers the ranked 
and unranked cases. 

We shall mostly adopt the EF + F-i(Fh,Fhi) point of view as it is useful when consid¬ 
ering other horizontal modalities or when making comparisons with the decision algorithm 
obtained for EF -I- F~^ in |Boj09| . 

3.2. Ehrenfeucht-Fraisse Games. As usual definability in FO^(<v, <h) corresponds to 
winning strategies in a Ehrenfeucht-Fraisse game that we briefly describe here. We adopt 
the EF -|- F“^(Fjj, Fj7^) point of view as the corresponding game is slightly simpler. Its 
definition is standard. 

There are two players, Duplicator and Spoiler. The board consists in two forests and 
the number k of rounds is fixed in advance. At any time during the game there is one pebble 
placed on a node of one forest and one pebble placed on a node of the other forest and both 
nodes have the same label. If the initial position is not specified, the game starts with the 
two pebbles placed on the root of the leftmost tree in each forest. Each round starts with 
Spoiler moving one of the pebbles inside its forest, either to some ancestor of its current 
position, or to some descendant or to some left or right sibling. Duplicator must respond by 
moving the other pebble inside the other forest in the same direction to a node of the same 
label. If during a round Duplicator cannot move then Spoiler wins the game. If Duplicator 
was able to respond to all the moves of Spoiler then she wins the game. Winning strategies 
are defined as usual. If Duplicator has a winning strategy for the game played on the forests 
s, t then we say that s and t are fe-equivalent. 

The following results on games are classical and simple to prove. 

Lemma 3.2 (Folklore). 

(1) For every k, k-equivalence is an equivalence relation of finite index. 

(2) For every k, each class of the k-equivalence relation is definable by a sentence of 
EF-FF~i(Fh,Fh^) such that the nesting depth of its navigational modalities is bounded 
by k. 

(3) For every k, if s and t are k-equivalent then they satisfy the same sentences of 
EF-hF”i(Fh,Fh^) such that the nesting depth of their navigational modalities is 
bounded by k. 

When played on words instead of forests, the game is the same except that now Spoiler can 
move either to a previous or to a following position. The results are identical after replacing 
FO^(<v, <h) with FO^(<), the two variable first-order logic on strings, using the predicate 
< for the following position relation. 
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3.3. Antichain Composition Principle. As mentioned in the introduction, we use in¬ 
duction to prove that if L satisfies the characterization then we can construct a FO ^(<V) <h) 
formula for L. At each step in this construction, we prove that L can be defined as the com¬ 
position of simpler languages such that a formula for L can be constructed from formulas 
dehning the simpler languages. This is what we do with the following simple composition 
lemma, essentially adapted from |Boj09| and using the same terminology. 

A formula of FO^(<v, <h) with one free variable is called antichain if in every forest, 
the set of nodes where it holds forms an antichain, i.e. a set (not necessarily maximal) of 
nodes pairwise incomparable with respect to the descendant relation. This is a semantic 
property that may not be apparent just by looking at the syntax of the formula. A typical 
antichain formula selects in a forest the set of nodes of label b & B that have no ancestor 
of label b. 

Given (i) an antichain formula (p, (ii) disjoint forest languages Ti, • • • , and (hi) labels 
ai, • • • , On € A and (iv) a forest s, we define the forest s' = s[{Li,ip) —>■ oi, • • • , (L„, ip) —>■ 
Qn] as follows. For each node x of s such that s \= p{x), we determine the unique i such 
that the forest language Li contains the subforest below x. If such an i exists, we remove the 
whole subforest below x, and replace it by a leaf of label a*. Since p is antichain, this can 
be done simultaneously for all x. Note that the formula p may also depend on ancestors of 
X, while the languages Lj only talk about the sub forest below x. 

The composition method that we will use is summarized in the the following lemma: 

Lemma 3.3. [Antichain Composition Lemma] Let p , Li,..., and oi,..., be as above. 
If Li,... ,Ln and K are languages definable in FO^(<v, <h)> then so is 

L = {t I t[{Li, p) —>■ oi, • • • , {Ln, p) o„] € A}. 

This lemma follows from a simple Ehrenfeucht-Frai'sse game argument. Using the 
EF -|-F“^(Fjj, Fjj'^) point of view we can also construct a formula dehning L. The for¬ 
mula for L is obtained from the one for K by restricting all navigation to nodes that are not 
descendants of nodes satisfying p and by replacing each test that a label is a* by the formula 
for Li where all navigations are now restricted to nodes that are descendants of a node that 
satishes p. The fact that p is antichain makes this construction sound. The details are 
simple and are omitted here as they paraphrase those given in |Boj09| for EF -|- F^^. 

The inductive step of our proof consists in exhibiting Li,..., and K, together with 
an antichain formula p such that L = {t \ t[{Li,p) —>■ ui,--- ,{Ln,p) On] € K} and 
K, Li,..., Ln have smaller inductive parameters than L. In |Boj09| the antichain formula 
is of the form: “select the set of nodes of label 6 € B that have no ancestor of label 6.” 
Observe that such a formula allows us to use the size of B as an induction parameter as 
K does not contain the label b. In our case, we replace B by sibling patterns that we will 
dehne in Section [3 


4. Forest algebras 

A key ingredient in our characterization is based on syntactic forest algebras. Forest algebras 
were introduced by Bojahczyk and Walukiewicz as an algebraic formalism for studying 
regular forest languages |BW07] . We work with the semigroup variant of forest algebras. 
Moreover we require that the port of each context has no sibling. These restrictions are 
necessary as, without them, the languages definable in FO^(<v,<h) would not form a 
variety, i.e. would not be characterizable by its syntactic forest algebra only. 
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We give a brief summary of the definition of forest algebras and of their important 
properties. More details can be found in |BW07] . A forest algebra consists of a pair [H, V) 
of finite semigroups, subject to some additional requirements, which we describe below. We 
write the operation in V multiplicatively and the operation in H additively, although H is 
not assumed to be commutative. 

We require that V acts on the left of H. That is, there is a map 

{h,v) e H xV ^vhe H 

such that 

w{yh) = {wv)h 

for all h G and v,w gV. We further require that for every g ^ H and v V contains 
elements {v + g) and {g + v) such that 

{v + g)h = vh + g, {g + v)h = g + vh 


for all h (z H. 

Let A = {A,B) be a finite alphabet. The free forest algebra on A, denoted by A^, is 
the pair of semigroups Va) where is the set of forests over A equipped with the 
+ operation and Va the set of contexts equipped with the composition operation, together 
with the natural action. One can verify that this action turns A^ into a forest algebra. 

A morphism a : —>■ {H 2 ,V 2 ) of forest algebras is a pair ( 7 ,<5) of semigroup 

morphisms 7 : —>• H 2 , 5 ■. Vi ^ V 2 such that ^{vh) = 6{v)'y{h) for all h & H, v £ V. 

However, we will abuse notation slightly and denote both component maps by a. 

We say that a forest algebra {H, V) reeognizes a forest language L if there is a morphism 
a : hA —)■ {H,V) and a subset X oi H such that L = a~^{X). We also say that the 
morphism a recognizes L. It is easy to show that a forest language is regular if and only if 
it is recognized by a finite forest algebra. 

Consider some forest language L over an alphabet A. We define an equivalence relation 

over contexts and over forests. Given two forests we say that A ^2 iff for any 

two forests s, s' and any context c, c{s + ti + s') £ L iff c(s + t 2 + s') £ L. Given two 
contexts ci,C 2 we say that ci C 2 iff for any forest s, cis C 2 S. This equivalence is 
a congruence of forest algebras that is of finite index iff L is regular. The quotient of 
by this congruence yields a forest algebra recognizing L which we call the syntactic forest 
algebra of L. The mapping sending a forest or a context to its equivalence class in the 
syntactic forest algebra, denoted a a, is a morphism called the syntaetie morphism of L. 

It is also important to know that given an MSO(<v, <h) sentence 4>, the syntactic forest 
algebra of and the syntactic morphism can be computed from f. 


Idempotents. It follows from standard arguments of semigroup theory that given any finite 
semigroup S, there exists a number uj{S) (denoted by u when S is understood from the 
context) such that for each element x of S, is an idempotent: x‘^ = x^x^. Given a forest 
algebra {H,V) we will denote by lo{H,V) (or just cu when {H,V) is understood from the 
context) the product of uj{H) and u}{V) and for any element u £ V and g £ H we will write 
vB and Log for the corresponding idempotents. 

Finally, given a semigroup S we will denote by the monoid formed from S by adding 
a neutral element. 
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Leaf Surjective Morphisms. Let A = [A, B) be a finite alphabet and let a : —>• {H, V) 
be a morphism into a finite forest algebra {H,V). We say that a is leaf surjective iff for 
any h ^ H, there exists a & A such that a{a) = h. 

Observe that given any morphism a : {A,B)^ —)■ (H,V), one can construct a leaf 
surjective one (3 : (Au H, B)^ —>■ {H, V) by extending a in the obvious way. We call /3 the 
leaf completion of a. 

Lemma 4.1. Let a : A^ —)• {H,V) he a morphism into a finite forest algebra, (3 he its 
leaf eompletion and h ^ H be sueh that /3“^(/i) is FO^(<v, <h) definable. Then a~^{h) is 
FO^(<v, <h) definable. 

Proof. A forest in a~^{h) is a forest in j3~^{h) that contains no leaf with label in H. There¬ 
fore, one can construct an FO^(<v, <h) formula for a~^{h) from a formula for f3~^{h). Q 

Working with leaf surjective morphisms will be convenient for us. Typically when 
applying the Antichain Composition Lemma we will construct K from L by replacing some 
subforests of L by leaf nodes with the same forest type. It is therefore important that such 
nodes exist. 

The string case. A reason for using syntactic forest algebras is that the same problem for 
strings was solved using syntactic semigroups. In the string case there is only one linear 
order and the corresponding logic is denoted by FO^(<). Recall that the syntactic semigroup 
(monoid) of a regular string language is the transition semigroup (monoid) of its minimal 
deterministic automata. It is therefore computable from any reasonable presentation of the 
regular string language. The following characterization was actually stated using syntactic 
monoids, but it is equivalent to this statemenlG. 

Theorem 4.2 f [TW98| i. A regular string language is definable in FO^(<) iff its syntactic 
semigroup S satisfies for all u,v € S: 

{uvffv{uvff = {uvff 

Unfortunately in the case of forest languages we were not able to state our characteri¬ 
zation using only the syntactic forest algebra of the input regular language. We will need 
an extra ingredient that we call saturation. 

5. Shallow multicontexts 

In this section, we define shallow multicontexts which represent sequences of siblings. We 
will often manipulate shallow multicontexts modulo FO^(<v, <h) dehnability. This is cap¬ 
tured by an equivalence relation on shallow multicontexts that we also define in this section. 

This notion of shallow multicontext is central for this paper as we will use it not only 
as a parameter in the inductive argument but also to define the notion of saturation that 
we use in our characterization of FO^ (<vj <h)- 

Set A = {A,B) as a finite alphabet. All definitions are parametrized by a morphism 
a : A^ —>■ {H, V). Note that while the dehnitions make sense for any morphism a, they are 
designed to be used with a leaf surjective one. Given a forest s (a context p) we refer to its 
image under a as the forest type of s (the context type of p). 


^We are actually not using the identity of |TW98| . Ours can easily seen to be equivalent to it. This is a 
folklore result. A proof can be found in | Kuf06| . 
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5.1. Shallow multicontexts. A multicontext is defined in the same way as a context but 
with several ports, possibly none. The arity of a multicontext is the number of its ports, 
possibly 0. Note that as our forests are ordered, each multicontext implicitly defines a linear 
order on its ports. A multicontext is said to be shallow if each of its trees is either a single 
node a € A, a single inner node with a port below, &(□), or a tree of the form h{a) where 
b & B and a G A (see Figure [3|). 

For technical reasons we do not consider forests with a single tree of the form a G A as 
a shallow multicontext. Observe that in our definition of shallow multicontext we include 
trees of the form b{a). This is because, as mentioned earlier, we will often replace a subforest 
by a node having the same type and therefore it is convenient to immediately have access 
to this type by looking at the label of that node. 



® + @ + (§) + @ + © + < 


Figure 3: Illustration of a shallow multicontext of arity 3 


Let X be a node of a forest s. Let t = ti + ■■■ + ti he the subforest of x, composed of 
i trees. The shallow multicontext of x in s is the sequence pi + ■ ■ ■ + pe such that pi := a 
if = a G A, Pi := b{a) if ti = b{a), and pi := &(□) if L = b{s'), where a & A,b G B 
and s' ^ A. A shallow multicontext p occurs in a forest s iff there exists a node x of s 
such that p is the shallow multicontext of x in s. In the rest of the paper, a node x will 
almost always be considered together with the shallow multicontext p occurring at x. For 
this reason we will write “let {p, x) be a node of a tree t” when x is a node of t and p is the 
shallow multicontext at x. Similarly, if P is a set of shallow multicontexts, we will write 
“let (p, x) G P” when p G P and x is a node of p. 

Given a shallow multicontext p of arity n and a sequence P of n forests, p\T] denotes 
the forest obtained after placing the i'^ forest of T at the port of p. Moreover, given 
a node x of p whose unique child is a port (i.e. x is the root of a tree of the form b{0) 
within p) and a sequence T of n — 1 forests, p[T, x] denotes the context obtained as above 
but leaving the subtree at x unchanged. 

P- Valid Forests and Contexts. Let P be a set of shallow multicontexts. Later on P 
will be a key parameter for the induction. We say that a forest t is P-valid iff it has more 
than one node and all shallow multicontexts occurring in t are in P. Similarly we dehne the 
notion of a P-valid context. Note that we distinguish forests with one node in the dehnition. 
This is a technical restriction that will be convenient without harming the generality of the 
argument as the omitted forests are dehnable in FO^(<v, <h)- We extend the notion of 
P-validity to elements of H and V. We say h G H is P-valid iff there exists a P-valid forest 
t such that h = a{t). Similarly for v gV . 

P-Reachability. The logic FO^(<v, <h) can be seen as a two-way logic navigating up and 
down within forests. Over strings, this two-way behavior is reflected by two partial orders 
over the syntactic semigroup capturing respectively the current knowledge when reading 
the string from left to right and from right to left, and correspond to the Green’s relations 
L and R. 
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Over forests it turns out that the relevant bottom-up order is a partial order on forest 
types while the relevant top-down order is a partial order on context types |Boj09| . In 
our case those are even parametrized by a set P of shallow multicontexts and are called 
P-reachability. The index within these orders will be another parameter in our induction. 

Let h, h' be two P-valid forest types, h is said to be P-reachable from the forest type h' 
if there exists a P-valid context type v such that h = vh'. Two forest types are P-equivalent 
if they are mutually P-reachable. 

The definition is similar for context types and is defined for any v,v' € V, not just 
P-valid ones. Given two contexts v,v' we say that v is P-reachable from v' whenever 
there is a P-valid context type u such that v = v'u. 

Notice that for both partial orders, if P C P' then P-reachability implies P'-reachability. 

We will reduce the case when all shallow multicontexts of P have arity 1 or less to the 
string case. On the other hand, when P contains at least one shallow multicontext of arity 
at least 2 we will make use of the following property: 

Claim 5.1. IfP contains a shallow multicontext of arity at least 2 then among P-equivalence 
classes of P-valid forest types there exists a unique maximal one with respect to P-reachability. 

Proof. Take p E P of arity n > 2. Given h,h' ^ H that are P-valid, consider t and t' two 
P-valid forests such that a{t) = h and a(t') = h'. Consider the sequence T of n P-valid 
forests containing copies of t and t', with at least one copy of t and one copy of t'. Now 
a{p\T]) is P-reachable from both h and h'. The result follows. D 

In the cases when Claim 15.11 applies we say that P is branching and we denote by 
Hp the maximal class given by Claim EU Finally, we say that a branching set of shallow 
multicontexts P is reduced if all P-valid forest types are mutually reachable, i.e. Pip is the 
whole set of P-valid forest types. 

5.2. F02(<v, <jj)-Equivalence for Shallow multicontexts. It will often be necessary 
to manipulate shallow multicontexts modulo definability in FO^(<v, <h)- Typically, when 
applying the Antichain Composition Lemma with a formula of the form “select all nodes 
whose shallow multicontext is in P but have no ancestor with that property”, it will be 
necessary that P is definable in FO^(<v, <h)- 

Definable set of shallow multicontexts. Intuitively, FO <h) treats a shallow mul¬ 
ticontext as a string whose letters are a, b{a), or b{0). More formally, we define Ag as the 
alphabet containing the letters a, &(□) and 6(a) for all a E A and b ^ B. We say that 
6 is the inner-label of 6(n) or 6(a). We see a shallow multicontext p as a string over the 
alphabet Ag. 

For each positive integer k and any two shallow multicontexts p and p', we write p =k p' 
for the fact that Spoiler has a winning strategy in the game played on p and p', seen as 
strings over Ag. In particular, we say that a set P of shallow multicontexts is fe-definable 
iff it is a union of classes of under =fc. As the name suggests a set P of ^-definable shallow 
multicontexts is definable in FO^(<v,<h)- particular we get the following claim which 
is an immediate consequence of Lemma 13.21 

Claim 5.2. For any k and any k-definable set P of shallow multicontexts, the language of 
P-valid forests is definable in FO^(<v, <h)- 




12 


T. PLACE AND L. SEGOUFIN 


Definable nodes in shallow multicontexts. It will also sometimes be necessary to refer 
to an explicit node within a shallow multicontext. Typically, when applying the Antichain 
Composition Lemma with a formula of the form “select all nodes {p, x) such that ... and 
having no ancestor with that property”. We will of course need to treat (p, x) modulo 
definability. It would be tempting to use a notion of definability similar to the one used for 
shallow multicontexts in the previous paragraph. However the notion of saturation building 
from this would give a necessary but not sufficient characterization for languages definable 
in F02(<.^„ < 1 j). In particular, the induction in our suffiency proof would not terminate (see 
Lemma l8.12p . Our notion of definability is based on an Ehrenfeucht-Fra’isse game relaxing 
the rules defined in Section 13.21 in order to grant more power to Duplicator. Moreover it 
will be useful to parametrize the game by a set X C H. In the sequel X will be another 
parameter of the induction denoting those forest types for which we are still looking for a 
formula defining the set of forests for that type. When applying the Antichain Composition 
Lemma, any forest of type h ^ X can be safely replaced by a node with the appropriate 
label as the corresponding language is definable in FO^(<v, <h)- 

Given X, we distinguish between three kinds of nodes within shallow multicontexts: 
port-nodes are nodes that are roots of trees of the form 6(n) with b £ B, X-nodes are nodes 
that are roots of trees of the form 6(a) with a £ A,b £ B and a{a) £ X, finally X-nodes 
are the remaining nodes, i.e. nodes with label a £ A or roots of trees of the form 6(a) with 
a £ A,b £ B and a(a) ^ X. 

Let p and p' be two shallow multicontexts, seen as strings over Ag. The fc-round 
X-relaxed game between p and p' is defined as in Section but tests on labels in this 
new game are relaxed, making the game easier for Duplicator. Since this alone makes the 
equivalence too permissive (this yields a non-necessary characterization), we compensate by 
giving Spoiler a third “safety” move in addition to the usual left and right sibling moves. 
Spoiler can use this third move only under specific conditions depending on the labels of 
the nodes the pebbles are currently on. These two modifications achieve the right balance 
of expressive power. 

At the start of each round Spoiler can move one of the two pebbles inside its shallow 
multicontext to some left or right sibling x. Duplicator must respond by moving the other 
pebble inside the other shallow multicontext in the same direction to a node y, if a: is an 
A-node then y must have the same label as x. Otherwise, if x is a port- or A-node, y 
must be a port- or A-node with the same inner-label. Hence, at any point in the game the 
pebbles may lie on nodes with labels 6(c), 6(c') with c ^ d and c, c' £ A U {□}. In that 
particular case Spoiler can use a ’safety’ move: he selects one of the two pebbles but does 
not move it, by hypothesis this pebble is on a node of label &(□) or 6(a) with a{a) £ A. 
Duplicator must then place the other pebble on a node of label 6(0). 

Given two nodes {p,x) and {p',x'), we write {p,x) {p,x') if i) p and p' contain 

the same set of labels in Ag, ii) x,x' have the same label and in) Duplicator wins the 
A:-rounds A-relaxed game between p and p' starting at positions {p,x) and {p',x'). It will 
also be convenient to define on shallow multicontexts only. We write p p' iff 
{p,x) {p',x') with x,x' the leftmost positions in p,p'. The following claim is immediate 

from the definitions: 

Claim 5.3. Assume p =^j _2 p', then for any port-node x £ p (resp. x' £ p') there exists a 
port-node x' £ p' (resp x £ p) such that {p,x) {p',x'). 




DECIDING DEFINABILITY IN F02(<v,<h) ON TREES 


13 


Proof. Set y,y' as the leftmost positions in p,p'. In the {k + 2)-rounds X-relaxed game 
between (p, y) and {p', y'), Spoiler can use the two initial rounds to move the pebble to x' in 
p' (resp. to X in p) and then use (if necessary) a safety move. Duplicator’s strategy yields 
the desired port-node x in p (resp. x' £ p'). □ 

It is not immediate from the definitions that is an equivalence relation (transitivity 
is not obvious). We prove it in the next lemma. 

Lemma 5.4. For all X F H and all k, is an equivalence relation. 

Proof. It is clear from the definitions that the relation is reflexive and symmetric. We now 
prove transitivity. Assume {p,x) ip',x') and {p',x') {p",x"). We want to show that 

{p,x) {p",x"). It is clear that p and p" contain the same set of labels and that x and 

x' have the same label. We need to prove that Duplicator has a winning strategy in the 
A:-rounds X-relaxed game between (p, x) and (p",x"). 

By hypothesis, Duplicator has winning strategies in the /s-rounds X-relaxed games 
between (p, x) and (p^x') and between (p^x') and (p",x"). We combine these strategies 
in the obvious way. Assume that Spoiler makes a non safety move on p, then Duplicator 
obtains an answer in p' from her strategy in the game on p and p', plays that answer as a 
move for Spoiler in the game on p' and p” which yields an answer in p” from her strategy 
on that game. This is her answer to Spoiler’s move, and it is immediate to check that this 
is a correct answer. By symmetry she can answer a similar move of Spoiler on p". 

Assume now that Spoiler makes a safety move on p. Observe that this cannot happen 
in the first-round since x and x” have the same label. Therefore after this round there will 
be at most k — 2 rounds left to play. Let b be the inner label of the pebble on p. Since a 
safety move was used, the pebbles in p and p" must have different labels. Hence at least 
one of these labels is different from the label of the pebble in p'. We distinguish two cases 
depending on which one it is. 

If the pebbles on p' and p" have different labels, then Duplicator can use a Safety move 
in the game between p' and p" to get y" in p" with label 6(n) from where she can continue 
to play the game. If the pebbles on p' and p have different labels, then Duplicator can use a 
Safety move in the game between p' and p to get y' in p' with label &(□). We can then use 
Claim [5^ to get y” with label &(□) and such that {p",y") —^-2 Duplicator’s 

answer. It is correct since the number of rounds left to play is less than k — 2. Q 

As for Lemma 13.21 is it easy to show that for all k, has finite index. Finally, the 
following claim is a simple variant of Claim [521 

Claim 5.5. Let X C H and (p, x) be a node. There is a FO^(<v, <h) formula fjp^x having 
one free variable and such that for any forest s, holds exactly at all nodes (p', x') such 
that (p, x) {p',x'). 

Proof. It is immediate that (p, x) =k (p^x') => (p, x) {p',x') (following the strategy 
provided by (p, x) =k (p^ x') prevents Spoiler from using any safety move in the A-relaxed 
game). Hence any =^-class is a union of =fc-classes and the result follows. □ 
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6. Decidable Characterization of FO^(<v, <h) 

In this section we present our decidable characterization for FO^(<v, <h)- It involves three 
properties of the syntactic morphism of the language. As we explained, the first two are sim¬ 
ple identities on H and V, the third one, saturation is a new notion that is parametrized by 
a set P of shallow multicontexts and the associated equivalences. We first define saturation 
and then state the characterization. 

6.1. Saturation. Let a : ^ {H,V) be a morphism into a finite forest algebra {H,V). 

Note that as for shallow multicontexts while saturation makes sense for any morphism, it is 
designed to be used with leaf surjective ones. In particular, in the characterization, we state 
saturation on the leaf completion of the syntactic morphism of the language. Consider some 
branching and reduced set P of shallow multicontexts (note that we do not ask P to be 
definable). Recall that P will be the set of allowed patterns. Let Hphe the unique maximal 
class given by Claim IaTI Note that since P is reduced, Hp is also the set of P-valid forest 
types. 

Set A: G N. We say that a context A is {P, k)-saturated if (i) it is P-valid and, (ii) 
for each port-node {p, x) G P there exists a port-node {p', x') on the backbone of A such 
that (p, x) — {p',x'). We say that a is closed under k-saturation if for all branching and 
reduced sets P of shallow multicontexts, for all contexts A that are (P, fc)-saturated and all 
hi,h 2 G Pip, we have: 


a{A)^hi = a{AYh2 ( 6 . 1 ) 

We say that a is closed under saturation if it is closed under ^-saturation for some k. 
We will need the following simple observation. 

Lemma 6.1. Let a : —>■ {H,V) he a morphism into a finite forest algebra {H,V) and 

k,k' two integers such that k' > k. If a is closed under k-saturation then a is closed under 
k'-saturation. 

Proof. This is immediate since by definition, any A that is (P, A:')-saturated is (P, k)- 
saturated as well. □ 

6.2. Characterization of FO^(<v, <h)- We are now ready to state the main result of this 
paper. 

Theorem 6.2. A regular forest language L is definable in FO^(<v, <h) iff its syntactic 
morphism a : —>■ [H, V) satisfies the following properties: 

a) H satisfies the equation 

uj{hg)gu;{hg) = uj{hg) ( 6 . 2 ) 

b) V satisfies the equation 

{uvYv{uvY = [uvY 

c) the leaf completion of a is closed under saturation. 


(6.3) 
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Notice that (j6.2j) and (16.3h above are exactly the identities characterizing, over strings, 
definability in FO^(<) (recall Theorem 14.21) and are therefore necessary for being definable 
in F 02 (<v, < 1 j). It is easy to see that they are not sufficient to characterize FO ^(<V) <h)' 
To see this, consider the language of trees that corresponds to Boolean expressions, with 
AND and OR inner nodes and 0 or 1 leaves, that evaluates to 1. One can verify that the 
syntactic forest algebra of this language satisfies (|6.2I1 and (16.3p . However it is not definable 
in FO^(<v, <h)) actually not even in FO(<v, <h) |Pot94| . 

Recall that FO^(<v, <h) can express the fact that a forest is a tree and, for each k, 
that a tree has rank k, hence Theorem 16.21 also apply for regular tree languages and regular 
ranked tree languages. 

In Section [3 we will prove that the properties listed in the statement of Theorem 16.21 are 
necessary for having dehnability in FO <h) using a simple but tedious Ehrenfeucht- 
Frai’sse argument. In Section [ 8 ] we prove the difficult direction of Theorem 16.21 i.e. that the 
properties imply dehnability in FO^ (<V) <h)' 

Finally in Section [9] we show that the properties listed in Theorem 16.21 can be effectively 
tested. This is simple for (16.211 and ()6.3p but will require an intricate pumping argument 
for saturation. Altogether Theorem 16.21 achieves our goal and provides a decidable charac¬ 
terization of regular forest languages dehnable in FO^ (<V) <h)- 

7. Correctness of the properties 

In this section we prove that the properties stated in Theorem 16.21 are necessary for being 
dehnable in FO^(<v, <h) • We prove that any language L dehnable in FO <h) is closed 
under saturation and its syntactic forest algebra satishes the identities (16.2p and (16.3p . 

If L is dehnable in FO^(<v, <h) then it is simple to see that its syntactic forest algebra 
must satisfy the identities (j6.2h and (j6.3p . This is because Identity (j6.2p is essentially 
concerned by sequences of forests with the + operation. Therefore each such sequence can 
be treated as a string over <jj and Theorem 14.21 can be applied to show that the identity 
is necessary. Similarly Identity (16.3p concerns only sequences of contexts that can also be 
treated as strings over <v. 

The necessary part of Theorem 16.21 then follows from the following proposition. 

Proposition 7.1. Let L be a forest language that is definable in FO^(<v, <h)- Then the 
leaf completion of the syntactic morphism of L is closed under saturation. 

The rest of this section is devoted to the proof of Proposition 17.11 It is a classical but 
tedious Ehrenfeucht-Fraisse argument. 

Assume L is dehnable in FO^(<v, <h)- It follows from Theorem [3T] that L is dehnable in 
EF -|- F~^(Fjj, Fh~^). Let a : {H, V) be the leaf completion of its syntactic morphism. 

Let k be the nesting depth of the navigational modalities used in the formula recognizing 
L, we prove that a is closed under /c-saturation. 

Let P be a branching and reduced set of shallow multicontexts. Let X = Hp he the 
associated class of P-valid forest types. Let A be a (P, /i:)-saturated context. Let u = a(A) 
and hi, /i 2 be two forest types in Hp. We need to show that u‘^hi = u^h 2 . 

For this we exhibit two forests Si and S 2 over A such that a(S'i) = u^hi and Q!(S 2 ) = 
u^h 2 and such that Duplicator has a winning strategy for the A:-move game described in 
Section [32] when playing on Si and 52. Therefore it follows from Lemma|32]that no formula 
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of EF + F~^(Fij, Fjj^^) whose nesting depth of its navigational modalities is less than k can 
distinguish between the two forests. This implies vk’hi = u‘^h 2 as desired. 

Our agenda is now as follows. In Section 17.11 we define the two forests Si and S 2 on 
which we will play. Then in Section 17.31 we give the winning strategy for Duplicator in the 
A:-move game on Si and S' 2 . 

We start with some definitions that will play the key role in the proof. The root of a 
forest is the root of the leftmost tree of that forest. Recall that the backbone of a context is 
the path containing all the ancestors of the port of that context. The skeleton of a context 
is the set of nodes composed of the backbone together with their siblings. Both notions are 
illustrated in Figure 01 




Figure 4: Illustration of the notions of backbone and skeleton. 


7.1. Definition of the forests Si and S' 2 . Let X = Hp = {hi, • • • , hi}. Recall that since 
P is reduced, X is also the set of P-valid types. In particular all subforests of a P-valid 
forest or context that are not leaves have a type in X. We fix a set {si,..., of P-valid 
forests such that for all i, Si ^ A and a(si) = hi. This is without loss of generality as for 
each i, hi is in Pp and therefore reachable from any type and therefore there is a forest of 
arbitrary depth of that type. 

Given a P-valid context C and £ forests ti, • • • ,t£, we say that C is the context obtained 
from C by replacing all subforests of type hj by tj if C is constructed by considering all 
the nodes x that are on the skeleton of C but not on the backbone and, if the subforest 
below X is s where Q!(s) = hj with j < i, we replace it with tj. By construction, C and C 
have the same skeleton. Notice that since P is reduced and C is assumed to be P-valid, the 
construction replaces the subforests below all ports and X-nodes on the skeleton of C that 
are not on the backbone and leaves the X-nodes unchanged. Since we assumed that all Si 
are not in A, this means that all X-nodes on the skeleton of C become port nodes within 
C'. Therefore, C may contain on its backbone shallow multicontexts that are not in P and 
saturation may not be preserved. We will show how to deal with this fact later. 

Since P is branching, there exists a shallow multicontext qn ^ P oi arity greater than 1. 
For i < £, we denote by Vi the context obtained from go by placing the forest Si into all the 
ports of go except for the rightmost one. 

By maximality of Pp relative to P-reachability, for aA i < £ there exists a P-valid 
context U[ such that hi = a{UDa{y£ - ■ ■Vi)u^hi. For i <£, we write Ui for the context 
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U'jVi ■ ■ ■ Vq. For dl\ i < t we write Ui = a{Ui). By definition, for i < i, the contexts Ui have 
the following properties: 

• Ui is P-valid, 

• UiU^hi = hi, 

• the context Ui contains for all j < i a subforest of type hj such that all nodes on the 
path to this copy are port-nodes (namely Sj within Vj). 

We now construct by induction on j contexts Aj and Uij, and forests Tij such that for 
all i < £, we have a{Aj) = u, a{Uij) = Ui and a{Tij) = hi. We initialize the process by 
setting for all i < t. 

• Uifl is formed from Ui by replacing all subforests of type hj by Sj, 

• Aq is obtained from A by replacing all subforests of type hj with Sj, 

• Tifi := Uifl ■ (Ao)^^ • si- 

By construction we have a(Ao) = u, a{Uifi) = Ui and a{Tifi) = UiU^hi = hi as desired. 
When j > 0, the inductive step of the construction is done as follows for all i < i: 

• Uij is formed from Ui by replacing each subforest of type hi/ by (see Figure [5|), 

• Aj is formed from A by replacing each subforest of type hi/ by Ti/j_i, 



Figure 5: Illustration of the construction of Uij from U: each subforest of type hi/ in Ui is 
replaced by 

By induction we have for all j < I, a{Uij) = a{Ui) = Ui, a{Aj) = q;(A) = u and 
a{Tij) = Uiu‘^hi = hi as required. Notice that each Uj contains a copy of for all 

i' < i. Let m = 2^ and let: 

5i:=(A™)(-+i)-.ri,™ 

52:=(A™)(-+i)-.r2,™ 

Note that by definition a(5'i) = n^/ii and q;(S' 2 ) = vUh 2 . Therefore the following 
lemma concludes the proof of Proposition 17.11 

Lemma 7.2. Duplicator has a winning strategy for the k-move game between Si and S' 2 . 


















18 


T. PLACE AND L. SEGOUFIN 


7.2. Basic Properties. Before proving Lemma 17.21 we state some basic properties of the 
construction. 

Recall that the operation constructing C from a P-valid C by replacing all subforests of 
type hj by tj does not preserve saturation. The issue is that all X-nodes become port-nodes 
and the resulting shallow multicontext may no longer be equivalent to one occurring in the 
backbone of C. In order to cope with this problem, we remember what the initial situation 
was. If X is a port-node on the skeleton of C (recall that C and C have the same skeleton) 
we say that: 

• X is an ex-port-node if x, as a node of C, was a port-node. 

• X is an ex-X-node if x, as a node of C, was a X-node. 

Observe that, by construction, for any port-node x of C, the subtree at x in C is h{tj) 
for some j and some b ^ B. For ex-X-nodes, this is by construction. For ex-port-nodes, 
this is because P is reduced and C was P-valid, hence all subforets of C but leaves had 
type in Hp. Also notice that all remaining nodes of C' are A-nodes and are unchanged 
with respect to C. 

Let x'l (resp. x' 2 ) be a node on the skeleton of a context (resp. constructed 
from a P-valid context Ci (resp. C 2 ) by replacing all subforests of type hi by for 
some fixed ji (resp. Tij^ for some fixed J 2 ). Let xi,X 2 be the corresponding nodes on the 
skeletons of Ci,C 2 - We write x\ when xi X 2 - Because is an equivalence, it 

is straightforward to verify that is also an equivalence relation. 

There is a game definition of called pseudo-A-relaxed game. In this game Spoiler 
can now use the safety move when one pebble is on a ex-port-node and the other on a ex- 
A-node or when both pebbles are on ex-A-nodes with snbtrees 6(rq such that 

ii ^ i 2 - In this case Duplicator must answer by placing the other pebble on a ex-port-node. 

By construction the following property is an immediate consequence of the (P, k)- 
satnration of A: Assume C is P-valid and that C is constrncted from C by replacing 
all snbforests of type hi by Tjj- for some fixed j. Then for any n and any ex-port-node x 
of C there exists a y in the backbone of A„ such that x y. We call this property, the 
pseudo-saturation of A„. 

Notice that when using pseudo-saturation in her strategy. Duplicator may end up in 
a situation where the pebbles are above subtrees and Tjjj with ji ^ j 2 . Note that 
Tij^ and Tij^ are essentially the same tree, only with different nesting. In this situation. 
Duplicator may have to play a subgame within the trees and Tijj- The following lemma 
states that this is possible as soon as ji, j 2 are large enough. 

Lemma 7.3. Given integers i,n,ji,j 2 such that 2(n — 1) -|- 1 < ji,j 2 , Duplicator has a 
winning strategy for the n-move game between Tij^ and Tij^. 

Proof. This can be proved by a simple indnction on n. Essentially this is a straightforward 
generalization of a classical argnment used to prove that the words and cannot be 
distinguished by a first-order formula of fixed quantifier rank provided that ji , j 2 are large 
enough (see |Str94] for example). □ 

7.3. The winning strategy: Proof of Lemma 17.21 We give a winning strategy for 
Duplicator in the A:-move game between Si and 82 . In order to be able to formulate this 
strategy, we first dehne the useful parameters and their key properties that we will later 


use. 
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The backbone of (52) is the path going through all the ports of the copies of Am 
within 5i {S 2 ) and the skeleton of 5i is the set of nodes within the backbone of 5i together 
with their siblings. 

The nesting level of a node x of 5i or S 2 is the minimal number j such that x belongs 
to a context Aj or Uij. We set the nesting level of the nodes that are in any copy of a forest 
si,. • •, to 0. The notion of nesting level is illustrated in Figure [H Note that this number 
is equal in siblings and may only increase when going up in the tree. When this number is 
low, we are near the leaves of the forest and we need to make sure that the current positions 
of the game point to isomorphic subtrees. Recall that because of the construction of the 
context Uij, a node of nesting level j always has, for all i' < i, a descendant that is at the 
root of a copy of Ti'j-i and, for all j' < j a descendant that is at the root of a copy of Aji. 


skeleton nodes: 
nesting level m 


skeleton nodes: 
nesting level j 


skeleton nodes: 
nesting level 0 






skeleton nodes: 
nesting level n < m 


skeleton nodes: 
nesting level j' 


skeleton nodes: 
nesting level 0 


Figure 6: Illustration of the notion of nesting level in the proof of Lemma 17.21 

The upward number of a node x G 5i (or x G 52 ) is the number of occurrences of Am 
in the path from x to the root of Si (see Figure [7]). When this number is low, we are near 
the roots and we need to make sure the current positions are identical. Fortunately the 
two forests 5i and S 2 are identical up to a certain depth. This number is equal in siblings. 
When moving up in the tree this number may only decrease and, by construction, it can 
only decrease when traversing a copy of Am and therefore the resulting node must be on 
the backbone of 5i (52). 

Given a node x G 5i (or x G S 2 ), the horizontal number of x is the maximal number 
n < k such that for all strict ancestors y of x, there exists a node 2 ; on the backbone of Am 
such that y z. Note that this number is equal in siblings and can only increase when 
going up in the tree. Recall also that if y is an ex-port-node by pseudo-saturation y 2 ; 
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all nodes: 
upward number 0 


all nodes: 
upward number n 


all nodes: 

upward number mu) 

Si 

Figure 7: Illustration of the notion of upward number in the proof of Lemma 17.21 

for some 2 ; on the backbone of A^- Hence, if all the strict ancestors of x are ex-port-nodes, 
then its horizontal number is k. In particular all nodes x in the skeleton of Si (or S 2 ) have 
horizontal number k. 

We now state a property V{n) that depends on an integer n and two nodes x € Si and 
y (z 82 - We then show that when V{n + 1) holds in a game starting at x, y, then Duplicator 
can play one move while enforcing V{n). As it is easy to see that V{k) holds for the roots 
of Si and S 2 , this will conclude the proof of Lemma [72J The inductive property V{n) is a 
disjunction of three cases: 

(1) There exist ancestors x, y oi x,y such that x and y have nesting level > 2n -|- 1, upward 
number > n and horizontal number > n. Furthermore, either 

(a) x y and Duplicator has a winning strategy in the n-move game played on the 
subtrees at x and y, starting at positions x and y, or, 

(b) Duplicator has a winning strategy in the n-move game played on the subforests of 
X and y, starting at positions x and y. 

(2) The nodes x and y are at the same relative position within the copy of the context 

in their respective forest. 

(3) The upward numbers of x and y are > n, their nesting levels are > 2n -|- 1 and their 
horizontal number are > n. Moreover, we have x ~n y- 

Observe that there is a factor 2 involved in the conditions on the nesting levels of the nodes. 
We need this factor in order to be able to use Lemma 17^31 

Assume we are in a situation where 'P{n + 1) holds. We show how Duplicator can play 
to enforce V{n). The strategy depends on why V{n + 1) holds. In all cases we assume that 
Spoiler moves the pebble from x in Si. The case when Spoiler moves the pebble from y in 
S 2 is symmetrical. Recall that n < k, and m = 2^ > 2n. 

7.3.1. Case 1. V{n + 1) holds because of Item (1). 

In this case we have two nodes x, and y satisfying the properties of Item (1). 


ft 

o 

o 
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• Spoiler moves from x to a node that is still in the subtree at x. In that case, Duplicator 
simply responds in the subtree at y using the strategy provided by Item (1) oiV{n + l) and 
V{n) holds because Item (1) remains true. 

• Spoiler moves to a sibling x' of x. This can only occur if x = x and y = y. If a) holds, by 
hypothesis we have x ^n+i Vi therefore. Duplicator can answer with a sibling y' of y such 
that x' y'. Since x' {y') is a sibling of x (y), it has the same upward number, nesting level 
and horizontal number as x (y). Hence by hypothesis, all those numbers satisfy Item (3) of 
'P{n) and we are done. If b) holds Duplicator simply responds in the subforest of y using 
the strategy provided by b) and V{n) holds because Item (l.b) remains true. 

• Spoiler moves to an ancestor x' of x. 

Assume hrst that the upward number of x' is < n. Recall that by hypothesis the upward 
number of x is > re + 1. Hence x' is on the backbone of Si. As, by hypothesis, y has also 
an upward number > (re + 1 ), the copy y' of x' in the other forest is also an ancestor of y. 
Duplicator then selects y' as her answer, satisfying Item (2) of 'P(re). 

Assume now that the upward number of x' is > re. Since the horizontal number of x is 

> re + 1, there exists a node z on the skeleton of such that x' 2 ;. By hypothesis 

the upward number of y is > (re + 1). Hence we can find above y an occurrence of A^, of 
upward number > re. Duplicator answers by the copy of z in this occurrence of A^. By 
construction, x'^y' have upward numbers > re. Moreover x' (y') is an ancestor of x (y) and 
therefore has a bigger nesting level. As by hypothesis the latter was > 2(re + 1) + 1, x' and 
y' have nesting level > 2re + 1. For the same reason the horizontal number of x' is larger 
than the one of x and is therefore > re. It follows that Item (3) of V{n) is satisfied. 

7.3.2. Case 2. V{n + 1) holds because of Item (2). 

In this case x and y are at the same relative position within the copy of the context 
(Am)™'‘^ in their respective forest. 

• Spoiler moves to a node x' that remains within the context (Am)™'‘^. Then Duplicator 
copies this move in the other forest and R(re) is satisfied because of Item (2). 

• Spoiler moves to a node x' that is within the subforest Ti^rn of Si. In particular this means 

that x,y are on the backbones of 51 , 52 . By construction the subforest T 2 ^rn of S 2 contains 
at least one copy of the forest Ti^m-i that can be chosen such that all nodes occurring on 
the path to this copy are ex-port-nodes. It follows from Lemma 17.31 that there exists a node 
y' in Ti^ra-i such that Duplicator has a winning strategy in the re-move game played on 
Ti^rn and starting at positions x' and y'. This is Duplicator’s answer. 

Set X and y as the roots of the copies Ti^rn and Ti^m-i in 5i and 52 . Observe that by 
construction, x and y have nesting level >rre — l> 2 re-|-l, upward number rrew > re and 
horizontal number k > n. Moreover, by choice of y', Duplicator has a winning strategy in 
the re-move game played on the subforest of x and y, starting at positions x' and y'. We 
conclude that V{n) holds because of Item (l.b). 

7.3.3. Case 3. 'P{n + 1) holds because of Item (3). 

In this case the upward numbers of x and y are > re -|- 1, their nesting levels are 

> 2(re -|- 1) -|- 1 and their horizontal number are > re -|- 1. Moreover we have x '^^+1 V- 

• Spoiler moves horizontally. Then Duplicator moves according to the winning strategy 
provided by x ~^+i y and Item (3) of V{n) holds. 

• Spoiler moves to an ancestor x' of x. 
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If the upward number of x' is < n, as the upward number ol x > n + 1, x' must be on 
the backbone of Si. Duplicator answers by the copy y' of x' in the other forest, satisfying 
Item (2) of V{n). Note that the upward number of y is > n + 1. Therefore y', having an 
upward number < n is indeed an ancestor of y. 

If the upward number of x' is > n. By hypothesis, the horizontal number of x is > n+1, 
therefore, there exists a node z in the skeleton of such that x' By hypothesis 

the upward number of y is > n + 1. Hence we can find above y an occurrence of of 
upward number n. Duplicator answers by the copy y' of z in this occurrence of A^- By 
construction we have x' y'. By hypothesis, for x', and by construction, for y', both 

have upward number > n. As this is a move up, the nesting level increases and therefore 
remains > 2n + 1. Hence Item (3) of V{n) is satisfied. 

• Spoiler moves down to some node x'. Note that this means that x is a port-node and 
therefore either an ex-A-node or an ex-port-node. Moreover, since x ^^+1 i® ^^®o 

true of y. Set and as the subforests below x and y. Observe that by hypothesis 

jx^jy > 2(n -|- 1). We distinguish two cases: 

Assume first that x and y are both ex-A-nodes and ix = iy Set x as x and y as y. Using 
Lemma 17.31 we get that Duplicator wins the (n -|- l)-moves game played on the subtrees at 
X and y. This gives Duplicator’s answer y' to x' and Item (I.a) of V{n) holds. 

Otherwise, we use pseudo-saturation to prove that there exists a node z on the backbone 
of Am such that z x y and provide an answer satisfying Item (l.b) of 'P(n) for 
Duplicator. 

When either x or y is an ex-port-node node, the existence of z is immediate from pseudo¬ 
saturation and transitivity of . In the only remaining case, x and y are both ex-A-nodes 
and ix 7^ iy Therefore, Spoiler is allowed to use the safety move in the pseudo-A-relaxed 
game x ^^+1 Set a ex-port-node z' in the shallow multicontext of y such that 

z' X. By pseudo-saturation we then obtain z on the backbone of Am such that z z'. 
By transitivity, we get that ^ x y. 

We can now describe Duplicator’s answer. By hypothesis, x' belongs to and by 

definition Tiy,jy contains at least one copy of the forest Ti^jy-i that can be chosen such 
that all nodes occurring on the path to this copy are ex-port-nodes. Since jx,jy > 2 (n -t- 1 ), 
it follows from Lemma 17.31 that there exists a node y' in such that Duplicator has 

a winning strategy in the n-move game played on and starting at positions 

x' and y'. This is Duplicator’s answer. 

Set X and y as the roots of the copies and in Si and S' 2 . Observe that by 

definition, x and y have nesting level jx,jy — 1 > 2 n -|- 1 and upward numbers > n -|- 1 > n 
(the same as x,y). Moreover, all ancestors of x,y are either ancestors of x,y, ex-port-nodes 
or X, y themselves. Since we proved that there exists z on the backbone of Am such that 
z X y, it follows that x, y have horizontal number > n. Finally, by choice of y', 
Duplicator has a winning strategy in the n-move game played on the subforests of x and y, 
starting at positions x' and y'. We conclude that V{n) holds because of Item (l.b). 

This concludes the proof of Lemma 17.21 and therefore the proof of Proposition 17.11 

8. Sufficiency of the properties 

For this section we fix a regular forest language L recognized by a morphism a : —>■ 

{H,V) into a finite forest algebra {H,V). Assume that H and V satisfy Identities (16.2p 
and (16.3p and that the leaf completion of a is closed under saturation. We prove that any 
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language recognized by a, including L, is definable in FO^ (<V, <h), concluding the proof of 
Theorem 16.21 

Recall that given a forest s (a context p) we refer to its image by a as the forest type of 
s (the context type of p). In view of Lemma [4.11 we assume without loss of generality that a 
itself is leaf surjective and closed under saturation. By definition, this implies in particular 
that for each h ^ H there exists a tree consisting of a single node whose forest type is h. 

As mentioned earlier, we will often manipulate shallow multicontexts modulo =k for 
some fixed integer k. We start by defining a suitable k. Given a shallow multicontext q and 
a forest s we denote by q[s] the forest constructed from q by placing s at each port of q. 

Lemma 8.1. There exists a number k' such that for all k > k', for all shallow multicontexts 
p p' and for all forests s, p[s] and p'[s] have the same forest type. 

Proof. This is a consequence of Theorem 14.21 and the fact that H satisfies Identity (16.211 . 
Consider strings over H as alphabet and the natural morphism /3 : H'^ —>■ H. Since 
H satisfies Identity (j6.2p . it follows from Theorem 14.21 that for every h ^ H, (d~^{h) is 
definable using a formula of ^ph of FO^(<). We choose k' as the maximal rank of all these 
formulas. 

Let k > k' and take p =k p' and s some forest. Let ti,... ,tn be the sequence of trees 
occurring in p[s] and ti,...,t'^, be the sequence of trees occurring in p^[s]. For all i let 
hi = a{ti) and h[ = a{t'i). As p =k p' the strings hi ... hn and hi ... /i(^/ satisfy the same 
formulas of FO^(<) of rank k' over the alphabet H. Let h = /3(/ii ... hn), by our choice 
of k' it follows that h'l... h'^, \= ph- Hence j3{hi...hn) = h = f3{hi... h'^,). Therefore 
a{p[s]) = a{p'[s]). □ 

As a is closed under saturation, there is an integer k" such that a is closed under k"- 
saturation. We set k as the maximum of k' as given by Lemma l8.II and k". By Lemma l6.II 
a remains closed under /c-saturation. Recall that a set P of shallow multicontexts is k- 
definable if it is a union of equivalence classes of =fc. 

Recall that is the monoid obtained from V by adding a neutral element ly. For 
each h € H, V G and each set P of shallow multicontexts let 

L^h — I va{t) = h and t is P-valid} 

Our goal in this section is to show that: 

Proposition 8.2. For all hG H, all v G and all sets P of shallow multicontexts, there 
exists a language definable in FO^(<v, <h) that agrees with on P-valid forests. 

Theorem 16.21 is a direct consequence of Proposition 18.21 Let L' be the union of all 
definable languages resulting from applying Proposition 18.21 to all where h G a(L), 
V = ly and P is the set of all shallow multicontexts. By definition L' is definable in 
FO^(<v, <h) and agrees with L on all P-valid forests. Hence L = L' U {a G A j a G L} 
which is definable in FO ^(<V) <h)- 

The remainder of this section is devoted to the proof of Proposition 18.21 Assume that 
V, h and P are fixed as in the statement of the proposition, we prove that there exists a 
definable language that agrees on on P-valid forests. We begin by considering the 
special case when P is not branching (i.e. contains only shallow multicontexts of arity 0 or 
I). In that case we conclude directly by applying Theorem 14.21 
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8.1. Special Case: P is not branching. In this case we treat our forests as strings and 
use the known results on strings. Since all shallow multicontexts in P have arity 0 or 1, any 
P-valid forest t is of the form: 

‘ ‘ ‘ 

where k is possibly 0 and the ci, • • • ,Ck are P-valid shallow multicontexts of arity 1 and s a 
P-valid shallow multicontext of arity 0. For each u and g £ H, consider the languages: 

Mu,g = {t I t = Cl ■ ■ ■ CfcS is P-valid, q;(ci ■ ■ ■ Ck) = u, and a(s) = g} 

Notice that is the union of those languages where vug = h. We show that for any u 
and g, there exists a language definable in FO^(<v, <h) that agrees with Mu,g on P-valid 
forests. This will conclude this case. 

By dehnition shallow multicontexts of arity 1 are contexts. Let {ui,...,u,i} be the 
context types that are images of shallow multicontexts of arity 1 in P. 

Let P' be the set of shallow multicontexts from P of arity 1. Let p,p' € P', by Lemma FS.!! 
if p p' for all forests s, p[s] and p'[s] have the same forest type. Hence, p and p' have the 
same context type. This means that for all Vi the set of shallow multicontexts of context 
type Vi is ^definable. Therefore, by Claim 15.21 there is a formula 0^. (x) of FO ^(<VJ <h) 
testing whether the shallow multicontext of x has Vi as forest type. 

Let F = {di, ...,dn} be an alphabet and dehne a morphism /3 : T* ^ V hy /3{di) = Vi. 
Since V satisfies Identity (j6.3p . for each u € V there is a FO^(<) formula pu such that 
the strings of F* satisfying ipu are the strings of type u under /3. From (pu we construct a 
formula of FO^(<v, <ji) defining all P'-valid contexts having u as context type. This is 
done by replacing in all atomic formulas Pdi{x) with 9vi{x). We can also easily dehne in 
FO^(<v, <h) the set of shallow multicontexts of arity 0 such that Q!(s) = g. After combining 
this last formula with we get the desired language dehnable in FO^(<v, <h) and agreeing 
with Mu^g on P-valid forests. 

In the remainder of the proof we assume that P is branching, i.e. it contains one 
shallow multicontext of arity at least 2. Recall, that by Claim EH it follows that there 
exists a unique maximal P-reachable class Pip. The rest of the proof is by induction on 
three parameters that we now dehne. 

8.2. Induction Parameters. The hrst and most important of our induction parameters 
is the size of the set of P-valid forest types. We denote this set by X. Observe that by 
dehnition Hp C X. 

Our second parameter is an index dehned on sets P of shallow multicontexts. During 
the proof we will construct from P new sets P' by replacing some of their port-nodes with 
A-nodes. Our dehnition ensures that the index of P' will be smaller than the index of P, 
hence guarantees termination of the induction. It is based on following preorder on shallow 
multicontexts called simulation modulo X. 

Given two shallow multicontexts p and p', we say that p simulates p' modulo X if p' is 
obtained from p by replacing some of its port-nodes 6 (n) by an A-node b(a) with the same 
inner label. Observe that simulation modulo A is a partial order. 

For each shallow multicontext p its X-number is the number of non =^ 2 " 6 duivalent 
shallow multicontexts q (not necessarily in P) that can be simulated modulo A by p. For 
each set P of shallow multicontexts the n-index of P is the number of non =^ 2 " 6 Quivalent 
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shallow multicontexts p € P of X-number n. Our second induction parameter, called the 
index of P, is the sequence of its n-indexes ordered by decreasing n. 

The third parameter is based on v. Consider the preorder on context types defined by 
the quotient of the P-reachability relation by the P-equivalence relation. The P-depth of a 
context type v is the maximal length of a path in this preorder from the empty context to v. 

We prove Proposition 18.21 by induction on the following three parameters, given below 
in their order of importance: 

(i) 1^1 

(ii) the index of P 

(iii) the P-depth of v 

We distinguish three cases: a base case and two cases in which we will use antichain 
composition and induction. We say that a context type u P-preserves n if n is P-reachable 
from vu. A context c P-preserves v if its context-type P-preserves v. 

8.3. Base Case: P is reduced and v is P-preserved by a (P, A:)-saturated context 

A. We use saturation to prove that v is constant over the set X = Pp of P-valid forest 
types, i.e. for any P-valid /ii,/i 2 G H, vhi = vh 2 . Since all forests in are P-valid, it 
follows that is either empty or the language of all P-valid forests. The desired definable 
language is therefore either the empty language of the language of all forests. 

Since A P-preserves v, there exists a P-valid context c such that va{Ac) = v. It follows 
that V = va{Ac)‘^. Moreover, observe that since A is P-saturated and c is P-valid, Ac is 
P-saturated as well. It then follows from saturation that 

vhi = va{Ac)‘^ hi = va{Ac)‘^ h 2 = vh 2 

This terminates the proof of the base case. We now consider two cases in which we 
conclude by induction. 

8.4. Case 1: P is not reduced, Bottom-Up Induction. By definition, since P is 
not reduced there exists a P-valid forest type g E X \ Hp. We choose g to be minimal 
with respect to P-reachability, i.e., any P-valid forest type g' is either P-equivalent to g 
or g is not P-reachable from g'. Let G be the set of P-valid forest types that are P- 
equivalent g. Observe that by minimality of g, in any P-valid forest s whose type is in G, 
all subforests of s that are not single leaves have a forest type in G (recall that a subforest 
consists of all the children of some node). Moreover, by choice of < 7 , G Ci Hp = 0 and 
all g' & X such that G is P-reachable from g' are in G. We obtain the desired definable 
language for via the Antichain Composition Lemma using languages that we prove to 
be definable in FO^(<v, <h) by induction on |X|. Correctness of the construction relies on 
both Equation (16.2p and Equation (j6.3p . 

Outline. Our agenda is as follows. We construct from P a /c-definable set P' and prove 
that a P-valid forest has a type in G iff it contains no shallow multicontext of P'. Since 
/^-definable sets of shallow multicontexts can be expressed in EO^(<v, <h)j we use P' to 
define an antichain formula ip which selects all positions whose subforest contains a shallow 
multicontext in P' (i.e. has a forest type outside G) but have no descendant with that 
property (i.e. all descendants of the position have a subforest of type in G). This formula 
splits P-valid forests into two parts: a lower part and upper part. In the lower part all 
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subforests have type in G C X and in the upper part the set of valid forest types is included 
in X \ G. In both cases we get definable languages by induction on |X| we glue them back 
together using the Antichain Composition Lemma. This situation is illustrated in Figure [HI 

All forest types in X \ G 



Figure 8: Illustration of the Antichain Composition Lemma for Case 1. The marked are 
the lowest nodes whose subforest contains a shallow multicontext in P' 

Definition of P'. Let s be some arbitrarily chosen P-valid forest such that a(s) G G. We 
set 

P' = {p I a(p[s]) 0 G} 

We prove in the next lemma that P' is well-defined, i.e. that its definition does not depend 
on the choice of s. 

Lemma 8.3. Let p be a shallow multicontext of arity n and T and T' he two sequences of 
n P-valid forests of forest type in G. We have: 

a{p[T]) € G ^ a{p[T']) G G 

Proof. We use Identity ()6.3p to prove this lemma. Let T = (ti,...,t„) and T' = ..., f^). 

For i G [l,n] we write c* the context obtained from p\T'] by replacing by a port and t'- 
by tj for j > i. Notice that by hypothesis on p, T and T', Ci is P-valid for all i < n. For all 
i < n, we write Ui = a(cj), h^ = a(L) and h[ = a(t'f). We first show that: 

Vi < n, Uihi G G Uih[ G G (8.1) 

Assuming that ttjhj G G, we show that Uih^ G G. By symmetry this will prove ()8.ip . As G is 
closed under mutual P-reachability, it is enough to show that is mutually P-reachable 
from /i(. By definition Uih'- is P-reachable from h'-, therefore it remains to show that h[ is 
P-reachable from Uih[. From Uihi £ G we get that /i( is P-reachable from ttj/ij and therefore 
there is a P-valid context u such that = uuihi. By hypothesis hi is P-reachable from h[ 
and therefore there exists a P-valid context u' such that hi = u'h[. A little bit of algebra 
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and Identity (16.,ip yields: 

h[ = uuiu'h[ 

= uui{u'uuiY u'h[ 

= uui{u'uui)‘^uui{u'uui)‘^u'h[ using Identity (I6.3p 

= {uUiu'Y uUi{uUiu'Y^^h[ 

= {uUiu'Yu Uih[ 

as {uuiu')‘^u is P-valid, /i' is P-reachable from Uih[ and (jS.ip is proved. 

For concluding the proof of the lemma, notice that by construction a{p[T]) = uihi, 
a{p[T']) = Unh'^ and Uj/i' = Uj+i/ij+i. As from (|8.ip we get Uihi € G iff Uih'^ € G, this 
implies by induction on i that for all i < n, ui/ii G G iff Uihi € G iff Uih[ G G. The case 
i = n proves the lemma. D 

We now prove that P' can be used to test whether a P-valid forest has a type in G. 

Lemma 8.4. A P-valid forest has type in G iff it is (P \ P')-valid. 

Proof. This is a consequence of Lemma 18.31 Let t be a P-valid forest such that a{t) 0 G. 
We prove that t contains a shallow multicontext of P'. Consider a minimal subforest f of 
t whose type is not in G. Then we have t' = p[T] where p is a shallow multicontext and T 
a sequence of forests of forest type in G (possibly empty if p is of arity 0). Let T' be the 
sequence s for some s with a(s) G G. By Lemma 0 G and therefore p G P'. 

Conversely, if a{t) G G, by minimality of G, all subforests of t are in G. It is then 
immediate by definition of P' and Lemma 18.31 that t cannot contain a shallow multicontext 
of P'. □ 

Setting up the Composition. Let (p be the antichain formula which holds at port-nodes 
[p, x) such that p ^ P' and x has no descendant with that property. It follows from the 
next lemma that p is expressible in FO^(<v, <h)- 

Lemma 8.5. P' is k-definable. 

Proof. This is a consequence of Lemma 18.11 (which is itself a consequence of (|6.2p l. Set 
p ^ P' and p' =k p. We prove that p' G P'. By definition of P', a(p[s]) ^ G. As p =fc p', by 
choice of k and Lemma ISTTl we get a(p'[s]) = a(p[s]). Hence a(p'[s]) ^ G and p' € P'. □ 

We now define the languages that we will use to apply the Antichain Composition 
Lemma. 

Lemma 8.6. For any g £ G, there exists a language definable in FO^(<v, <h) lh.at agrees 
with g on P-valid forests. 

Proof. Notice that the set of all (P\P')-valid forest types is G C X. Hence by induction on 
the first parameter in Proposition 18.21 there exists a language Pi definable in FO^(<v, <h) 
that agrees with ^ on (P \ P')-valid forests. By Lemma lS.dl a P-valid forest has type 

p G G iff it is (P \ P')-valid. Hence Pi agrees with on P-valid forests. □ 
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Assume G = {gi, ■ ■ ■ ,gn}- For all i < n, let Li be a language definable in FO^(<v, <h) 
that agrees with ^, on P-valid forests given by Lemma 18.61 

Let Q be the set of shallow multicontexts q that can be obtained from some p € P by 
replacing some port-nodes (possibly none) with G-nodes of the same inner label and such 
that either: 

• q has arity greater than 1 (i.e. one port-node of p was left unchanged) 

• or q has arity 0 and p ^ P' (hence a{q) 0 G). 

We have: 

Lemma 8.7. There is a language K definable in FO < 1 j) that agrees with on 
Q-valid forests. 

Proof. Let Y be the set of Q-valid forest types. We prove that Y C X and Y Ci G = $. It 
will follow that |y| < |A|. Hence K is obtained by applying Proposition 18.21 bv induction 
on the hrst parameter. 

Let L € y, by definition, there exists a Q-valid forest s such that q;(s) = h. All shallow 
multicontexts q & Q occurring in s are constructed from p € P by replacing some port- 
nodes of p with G-nodes. As G contains only P-valid forest types, for any g ^ G there 
exists a P-valid forest whose type is g. By replacing the newly introduced G-nodes in s 
by the correspond P-valid forest with the same type we get a P-valid forest s' whose type 
remains h. Hence /i G A. Moreover, for any shallow multicontext of arity 0 occurring in s, 
the corresponding shallow multicontext occurring in s' must belong to P^ It follows that 
s' contains at least one shallow multicontext in P' and by Lemma 18.41 that h ^ G. D 

Applying Antichain Composition. We now apply the Antichain Composition Lemma 
to the languages K, and Li • • • defined above. The situation is depicted in Figure [HI 
Recall that G = {gi, ■ ■ ■ ,gn}- For any i < n, let Oj € A be such that a{ai) = gi. Set 
= {M ^[(Fi,p) —>■ On] G at}. Since K, Li,..., Ln are definable in 

F02(<v, < 1 j), it follows from Lemma [T3l that L is definable in FO^(<v, <h)- We terminate 
the proof by proving that L agrees with on P-valid forests. 

Lemma 8.8. Let t be a P-valid forest, then a{t) = a{t[{Li,ip) ai, - ■ ■ , (P„,(p) —Onj). 
Proof. This is immediate by definition of Q, K and Li • • • L„. Q 

8.5. Case 2: P is reduced but there exists no (P, A:)-saturated context A that P- 
preserves v, Top-Down Induction. In this case we use again the Antichain Composition 
Lemma using languages that we prove to be definable by induction on the index of P and the 
P-depth of V. Recall that since P is reduced, X = Hp. Correctness relies on Identity (jb.dp . 

Outline. We proceed as follows. First we use our hypothesis to define a port-node (p, x) G 
P with the following properties. For any P-valid forest t and port-node {p',x') of t such 
that {p,x) {p',x'), the context c obtained from t by replacing the subforest below x' by 

a port does not P-preserve v. Since by Claim 15.51 all such nodes {p', x') can be defined in 
F02(<^, < 1 j), this gives an antichain formula ip which selects such nodes having no ancestor 
with that property. Such a formula splits a forest in two parts: an upper part and a lower 
part. For the upper part, we will prove that the set of occurring shallow multicontexts has 
smaller index and use induction on that parameter. Moreover, observe that by choice of 
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{p,x), each subforest in the lower part is below a context that has larger P-depth than v, 
we will use induction on this parameter. This situation is depicted in Figure [9l 

V Set of shallow multicontexts 



Figure 9: Illustration of the Antichain Composition Lemma for Case 2. The marked nodes 
are the topmost nodes equivalent to {p,x). 

Definition of {p,x). Let {p,x) be a port-node in P. We say that {p,x) is P-bad for v iff 
there exists no P-valid context c satisfying the two following properties: 

(1) c P-preserves v. 

(2) the port-node {p',x') above the port of c verifies (p, x) {p',x'). 

Lemma 8.9. There exists a port-node {p,x) G P that is P-bad for v. 

Proof. This is where Identity (j6.3p is used. We proceed by contradiction and assume that 
no port-node (p, x) G P is P-bad for v. 

By definition, for all port-nodes (p, x) G P we get a P-valid context Cp^x that P-preserves 
V and such that the port-node (p^x') above the port of Cp^x verifies (p, x) (p',x'). Note 
that since is of finite index, we may assume that there are finitely many different contexts 
Cp^x for all (p, x) G P. Let A be the context obtained by concatenating all these finitely 
many contexts Cp^x for ah {p,x). By definition A is (P, /c)-saturated. We use Identity (j6.3p 
to prove that A P-preserves v which contradicts the hypothesis of this case. This is an 
immediate consequence of the next claim. 

Claim 8.10. Let u,u' gV sueh that both u and u' P-preserve v. Then uu' P-preserves v. 

We finish the proof of Lemma 18.91 by proving Claim 18.101 By hypothesis, we have 
w,w' G V that are P-valid and such that vuw = v and vu'w' = v. Set e = {wu'w'u)‘^, a 
little algebra yields vue = vu. Applying Identity (16.31) . we get that 

vu = vue = vueu'w'ue = vuu'w'ue 

Hence v = vuu'w'uew and since w'uew is P-valid, this terminates the proof. D 

Setting-up the Composition. For the remainder of the proof we set (p, x) G P as a 
port-node which is P-bad for v as given by Lemma 18.91 We define our antichain formula 
ip as the formula holding exactly at all port-nodes (p^x') such that (p, x) (p, x') and 
having no ancestor with that property. By definition, p is antichain and by Claim [5Al p is 
expressible in FO <h)- We now define the languages Li,..., and K necessary for 
applying the Antichain Composition Lemma. 
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Given two elements g and g' of H, we say that g is n^-equivalent to g' if for all context 
types u which do not P-preserve v (hence the P-depth of vu is strictly higher that the 
P-depth of v) we have vug = vug'. Set { 71 ,... , 7 ^} as the set of all ^'''-equivalence classes. 
For all i, we define Mi = {s | a(s) € 7 * and s P-valid}. 

Lemma 8.11. For all i, there is a language Li definable in FO^(<v, <h) that agrees with 
Mi on P-valid forests. 

Proof. Fix a ^'''-equivalence class 7 * and let g ^ 'ji. For any u such that vu is not P- 
reachable from v, by induction in Proposition 18.21 the third parameter has increased and 
the other two are unchanged, there is a language Ku definable in FO^(<v, <h) that agrees 
with on P-valid forests. The lemma then follows by taking for Lj the intersection 

of all languages Ku for u such that vu is not P-reachable from v. D 

Let P' = {p' I p —^2 P'}- Observe that by Claim [5l3l any shallow multicontext p' in 
P' contain at least one position x' such that {p, x) {p', x'). For p' G P', let xi, • • • , be 
all the port-nodes of p' such that (p, x) (p', Xj). Let &(□) be the label of all the Xi in p'. 
Let Ap' be the set of all the shallow multicontexts that are constructed from p' by replacing 
at all the positions Xj, &(□) by a label 6 (a) (possibly different for each position), for a such 
that a(a) G X. Let P be the union of all Ap/ for p' G P'. Finally, let Q = {P \ P') U P. 

Lemma 8.12. There is a language K definable in FO^(<v, <h) that agrees with L^^ on 

Q-definable forests. 

Proof. Let Y be the set of all Q-valid forest types. We first observe that Y <G X. The 
argument is similar to the one in the proof of Lemma 18.71 as the newly introduced shallow 
multicontexts can be represented by P-definable forests. If T P X, then the lemma follows 
by induction on the first parameter in Proposition 18.21 Otherwise X = Y and we prove 
that Q has smaller index than P. The result is then immediate by induction on the second 
parameter in Proposition 18.21 Note that this is where we use the fact that our notion of 
equivalence between positions is weaker than =fc. With a stronger notion, it would not be 
possible to prove that the index has decreased. 

Set n € N as the largest integer such that there exists a shallow multicontext in P' with 
A-number n. We prove that all p G P have a A-number that is strictly smaller than n. It 
will then be immediate from the definitions that Q has smaller index than P. 

Let p G P. By definition, there exists p' G P' and xi, ■ ■ ■ ,Xi G p' such that for all i, 
(p, x) {p',Xi) and replacing the labels &(□) at all positions Xi in p' by 6 ( 0 ) for a{a) G X 
yields p. In particular this means that p' simulates p modulo A and that the A-number of 
p is smaller or equal to that of p' and hence smaller or equal to n. We prove that p does 
not simulate any p" =^2 P' Luodulo A. It will follow that the inequality is strict which 
terminates the proof. 

We proceed by contradiction, assume that there exists p" =^2 P' such that p simulates 
p" modulo A. By definition, p" =^2 P' —^+2 P^ hence, by Claim 15.31 p" contains a port- 
node x" such that {p",x") {p,x). By definition of simulation, x" corresponds to a 

port-node x in p and a port-node x' in p'. Moreover, since x is a port-node, this means that 
x' ^ {xi,-- - jXi\, i.e. (p^x') is not =^-equivalent to (p,x). This contradicts the following 
claim. 

Claim 8.13. {p",x”) (p',x') (p,x). 
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It remains to prove Claim IHTT^ We prove that {p'\x") By definition, p' 

and p" use the same set of labels in Ag and x',x" have the same label &(□). We give a 
winning strategy for Duplicator in the X-relaxed game between {p",x'') and {p',x'). By 
definition p" is obtained from p' by replacing some port nodes with X-nodes with the same 
inner-node label. Therefore as long as Spoiler does not use a safety move, Duplicator can 
answer by playing the isomorphism. Assume now that Spoiler does a safety move. Then 
the pebbles are on positions z' € p' and z" G p" with labels 6(n) and 6(a) as Duplicator’s 
strategy disallow any other possibility such as b{a),b{a') where a ^ a'. Is Spoiler selects 
z", then Duplicator continues to play the isomorphism by leaving the other pebble on z'. 
If Spoiler selects z'^ observe that since p' =^2 'P”■> Claim [531 and get a node 

y" G p" such that {p',x') {p",y"), this is Duplicator’s answer. Duplicator can then 

continue to play by using the strategy given by {p',x') {p",y")- D 

Applying Antichain Composition. Let K and Li • • • be languages definable in 
F02 (<v, <jj) as given by Lemma 18.121 and Lemma 18.111 For all i let a* G A be such 
that a{ai) G 7 *. Set L = {t \ t\{Li,ip) —>■ ai,--- ,{Lk,p) —>■ an] G K}. It follows from 
Lemma 13.31 that L is definable in FO^(<v, <h)- We terminate the proof by proving that L 
agrees with on P-valid forests. 

Lemma 8.14. For any P-valid forest t, va{t) = va{t[{Li, ip) ^ ai, ■ ■ ■ , {L^, p) a^]). 

Proof. This is because {p, x) is P-bad for v. The proof goes by induction on the number 
of occurrences in t of port-node {q,y) such that {q,y) {p,x). If there is no occurrence, 

this is immediate as the substitution does nothing. 

Consider a node y of a shallow multicontext q such that {q, y) {p, x) and no node 
above y satisfies that property. Let s be the subforest below y in t and let i be such that 
a{s) G 7 j. Let c be the context formed from t by replacing s by a port and let Uc be its type. 
Since {p,x) is P-bad for v, Uc does not P-preserve v. Hence, vaft) = vUcOi{s) = vUcOi{ai) 
by definition of ^'’'-equivalence. 

We write t' = ca*, we already know that va{t') = va{t). Observe that by construction 
t'[{Li,ip) ai,--- ,iLk,ip) Ok] is t[{Li,ip) oi,-- - ,{Lk,ip) a^]. By induction we 
have that va{t') = va{t'[{Li, (/?)—>■ ai, ■■ ■ , {Lk,(p) —>■ a^]) which terminates the proof. □ 


9. Decidability 

In this section we prove that the characterization of FO^ (<V; <h) given in Theorem 16.21 is 
decidable. 

Theorem 9.1. Let L be a regular language of forests. It is decidable whether L is definable 
in FO^(<v, <h)- 

In view of Theorem 16.21 the decision procedure works as follows. From L we first 
compute its syntactic morphism a : —>■ {H,V). Then we check that ([6.20 holds in H, 

that (|6.3p holds in V and that a is closed under saturation. This is straightforward for (j6.2p 
and (|6.3n as H and V contain only finitely many elements. However it is not obvious from 
the definitions that closure under saturation can be decided. The main result of this section 
is an algorithm which, given as input a morphism a, decides whether a is closed under 
saturation. 
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Recall the definition of saturation. It requires the existence of a number k such that for 
all branching and reduced sets P of shallow multicontexts and all {P, /c)-saturated contexts 
a property holds. The main problem is that all these quantifications range over infinite sets. 
In the first part of this section we introduce an “abstract” version of these sets with finitely 
many objects together with an associated “abstract” notion of saturation and show that 
closure under saturation corresponds to closure under the abstract notion of saturation. 

Then, in the remaining part of the section we present an algorithm that computes the 
needed abstract sets. 

9.1. Abstraction. Let A = (A, B) be a finite alphabet and a : —)■ [H, V) be a morphism 

into a finite forest algebra {H,V). Recall that we see shallow multicontexts as strings over 
As, i.e. as elements of A^. In order to stay consistent with our notation on shallow 
multicontexts, we will denote by + the concatenation within A^. Recall that if Q C A^ 
is a set of shallow multicontexts, then we write (p, x) ^ Q instead of x is a node of some 
shallow multicontext p € Q. 

We start with some terminology. Let p be a shallow multicontext of arity n and let 
G C H. We denote by p(G') the set of forest types h £ H such that there exists a sequence 
T of n forests which all have a type in G and such that a(p[r]) = h. For a port-node x of 
p, we denote by p{G, x) the set of context types v € V such that there exists a sequence T 
of n — 1 forests which have all a type in G such that a(p[T, x\) = v. If x is not a port-node 
of p then we set p(G, x) = 0 for all G. The following fact is immediate. 

Fact 9.2. Let (p, x) and {q,y) be nodes and r = p + q. Then for any G P H, r{G) = 
p{G) + q{G), r{G, x) = p{G, x) + q{G) and r{G, y) = p{G) + q{G, y). 

Abstracting shallow multicontexts: Profiles. We now define an abstract version of 
positions in shallow multicontexts that we call profiles. 

Consider a pair (p, x) where p is a shallow multicontext and x a position in p. The 
profile of (p, x), denoted /3(p, x) is the quadruple v = fn, fv) where 

(1) i G {0,1, 2} is the arity of p counted up to threshold 2, 

(2) C As is the alphabet p, i.e. the set of labels used in p, 

(3) //f : 2^ —)■ 2^ is the forest mapping of p, defined as the mapping G i-7> p{G), 

(4) fy : 2^ ^ 2^ is the eontext mapping of (p, x), defined as the mapping G i-)-p(G, x). 
Observe that if p has arity 0 (i.e. p is a forest) then fn is the mapping G i-)- {a(p)}. 
Moreover, whenever x is not a port-node, fy is the mapping G i—)■ 0. We let P be the set 
of profiles of all shallow multicontexts. Observe that P is finite: 

P C {0,1,2} X 2^“ X (2^)^" X (2'^)^" 

In the rest of this section we shall denote by u, v, ... profiles (elements of P), by U, V, ... 
sets of profiles (subsets of P) and hy U,V,... sets of sets of profiles (subsets of 2^). 

Let us first present two semigroup operations for P. Both operations are adapted 
from the concatenation operation between shallow multicontexts. If (p, x) and (p', x') are 
pairs where p,p' are shallow multicontexts and x,x' are positions of p,p', then one can use 
concatenation to construct two new pairs: (p -|- p', x) in which we keep the position x of p 
and (p -|- p', x') in which we keep the position x' of p'. 
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Abstracted on profiles, this yields the two following operations. Let v, v' € P be two 
profiles and set fn, fv) = v and {i',M'g, f'jj, fy) = v'. We define, two new profiles 

V +£ v' € P and v +r v' G P as follows, 

V +£ v' = (j, Cs,gH,gv) with v +,, v' = (j, Cs,gH,gv) with 

• j = min{i + i', 2) * j = min{i + i', 2) 

• Cs = Bs U B'. • = Bs U B^. 

• gn-.G^ fniG) + /^(G). • gn ■■ G ^ MG) + MG). 

• gv:G^ MG) + MG). • g'y ■ G ^ fniG) + MG). 


On the shallow multicontext level, the definition exactly means that for any {p,x), 
{p',x') G A+ such that v = I3{p,x) and v' = j3{p',x'), we have v +£ v' = I3{p + p',x) and 
V +r v' = I3{p + p',x'). One can verify that +r and +£ are both semigroup operations. 
Moreover, the following fact is immediate from the dehnitions and states that one can use 
the operations +£ and +,. to compute the whole set P from the profiles of one-letter shallow 
multicontexts. 


Fact 9.1. P is the smallest subset of {0,1,2} x 2^“ x (2^)^^ x (2^)^^ such that: 

• P contains the profiles of one-letter shallow multicontexts: for all c G A^, /3{c,x) G P 
(where x is the unique position in c) 

• P is closed under +(. 

• P is closed under -1-^. 


Abstracting sets of shallow multicontexts: Configurations. Recall the dehnition of 
saturated contexts: let P be a set of shallow multicontexts, a context is {P, /c)-saturated iff it 
is P-valid and for all {p, x) G P, there exists a =^-equivalent position on the backbone of the 
context. This means that we need to dehne an abstraction of sets of shallow multicontexts 
P that contains two informations: 

• the set of P-valid types. 

• the set of images under a of (P, A:)-saturated contexts. 

For this we introduce the notion of configurations. Notice that this abstraction needs to be 
parametrized by the equivalence In order to do this, we will abstract this equivalence 
on prohles which are our abstraction of the objects compared by . 

There is an issue however. Intuitively, we want two profiles u and v to be “equivalent” 
if one can find {p,x) and {q,y) such that {p,x) {q,y), fi{p-,x) = v and /3{q,y) = u. 

Unfortunately, this is not the right definition as the relation we obtain is not transitive in 
general and hence not an equivalence anymore. This is a problem since the definition of a 
saturated context requires to pick one position {p, x) among a set of equivalent ones. We 
solve this problem by abstracting sets of equivalent positions directly by sets of profiles. 

Moreover, if P C Afi, the conhguration that abstracts P needs to have exhaustive 
information about all sets of equivalent positions that can be found in P. Therefore we 
define a configuration as a sets of sets of profiles, i.e. an element of the set: 

g: = 2^"’ 

Of course, we are only interested in elements of (t that correspond to actual sets of shal¬ 
low multicontexts. Let /c G N and X C H, we let 3k{oi,X] be the set of {X, k)-relevant 
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configurations: 

3k[oi,X] = {V € Gi I 3Q C such that 

1. \/q,q' G Q,3x,x' € q,q s.t. {q,x) {q',x') 

2 . V(( 7 ,x) € Q there exist {qi,xi) {qn,Xn) —k € Q such that 

{P{qi,xi),...,P{qn,Xn)} € V 

3. VV € V there exist ((?i, xi) {q-n, Xn) G Q such that 

V = {/3(gi,xi),. . .,^{qn,Xn)} 

} 

Note that condition (1) restricts the definition to sets of shallow multicontexts that are 
=^-equivalent. This will later be necessary when computing the sets of relevant configura¬ 
tions. However, when considering saturation, we will actually work with unions of relevant 
configurations. 

This definition takes care of the quantification over the infinite set of sets of shallow 
multicontexts in the definition of saturation. One quantification still needs to be dealt with: 
quantification over A: G N. We achieve this by defining the set of X-relevant configurations 
as the intersection of the previous sets for all k: 

'3[a,X] = ^3k[a,X] 

k 

We will present an algorithm for computing in the second part of this section. The 

following fact is immediate from the definitions: 

Fact 9.3. For any k,k' such that k <k' and X C H, 3[a,X] C ^yla-iX] C 3k[a-iX\. 
In particular, there exists £ G N such that for all k > i and all X H, 3\a,X] = Jfc[a,X]. 

Note that while proving the existence of i in Fact 19.31 is simple, computing an actual 
bound on i is more difficult and will be a consequence of our algorithm computing 3[a,X]. 

Finally we equip C with a semigroup operation by generalizing the operations +£ and 
-I-,, defined on P. Observe that +£ and -|-r can be generalized on sets of profiles: we define 
the sum of two sets as the set of all possible sums of elements of the two sets. If W, V G C, 
we can now define lA + V as the set 

{U +£ y V I U G AY} U { y U V I V G V} 

VgV 

The following fact can be verified from the definitions. 

Fact 9.4. (C, +) is a semigroup. 

Validity and Reachability for Configurations. Let V be a configuration and let V G 2^ 
be the union of all sets in V. The set of V-valid forest types is the smallest X C H such 
that for every fn, fv) € V, fniH) C X when i = 0 and fniX) C X otherwise. 

V-valid context types are defined as the smallest Y C V such that Y ■ Y C Y and for all 
fn, fv) € V, fv{X) C Y (with X the set of V-valid forest types). 

Finally, given V-valid forest types h and h', we say that h is V-reachable from h' iff 
there exists u G V that is V-valid and such that h = vh'. The following fact can be verified 
from the definitions. 
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Fact 9.5. Let Q be a set of shallow multieontexts such that V = {/3{q,x) \ {q,x) € Q}. 
Then h ^ H (resp. v € V) is V-valid iff it is Q-valid. Moreover, for all V-valid h, h' E H , 
h is V-reachable from h' iff h is Q-reachable from h'. 

We say that V is branching iff V contains a profile of arity 2. One can verify that this 
implies the existence of a maximal V-reachability class denoted Hy. Finally, we say that a 
branching V is reduced when all V-valid forest types are mutually reachable, i.e. H\> is the 
whole set of V-valid forest types. 

Profile Saturation. We are now ready to rephrase saturation as a property of the sets 
3[a,X]. Set X C H, we say that a configuration V E £ is X-compatible iff it is branching, 
it is reduced, H\; = X and V = |Jj Vi with Vi E J[a, X] for all i. 

Let V be an ^-compatible configuration. We say that u E V is V-saturated iff there 
exist vi ■ ■ ■ Vn = V such that: 

• for all j, Vj is V-valid. 

• for all V E V there exists {i,Ms, fn , fv) G V such that either i = 0 (i.e. V abstracts a 
set of non-port nodes) or Vj E fv{H\;) for some j. 

Let i be as defined in Fact [Ql The following fact is a simple consequence of the definitions. 

Fact 9.6. Set X Q H and v be an idempotent ofV. For every k > i the following properties 
are equivalent: 

(1) There exists a branching and reduced Q C A+ such that X = Hq and v is the image of 
some {Q, k)-saturated context. 

(2) There exists a X-compatible V E C such that v is V-saturated. 

Proof sketch. From top to bottom. Let Q and v be as in (1). Let A be the (Q, fc)-saturated 
context such that a(A) = v. We construct V and vi ■ ■ -Vn witnessing (2) as follows. To each 
{q,x) E Q, we associate the set U = {f{q',x') \ {q',x') {q,x)}. We then set V as the set 

of all such sets U. It follows from the definition and Fact l9.5l that Hq = X = Hy. Moreover, 

V is by definition a union of elements of 3fc[a, A] (and hence of 3k[ct,X] by definition of k) 
and is therefore A-compatible. It is then immediate to check that the {Q, A:)-saturation of 
A implies the existence of ui • • • with the desired properties. Note that we did not use 
the hypothesis that v is idempotent, it is only required for the other direction. 

From bottom to top. Let V and be as required for (2). By definition, we have 

V = Uj H where each Vj is iLv-^elevant and therefore {Hy, A;)-relevant. This means that for 

all i, there exists a set of shallow multicontext Qi for Vj as in the definition of 3k[o:, Hq]. We 
set Q = [JiQi. It is immediate from the definition of Q and Fact 19.51 that Hq = X = Hy 
and that vi,...,Vn are Q-valid. We construct the desired (Q, fc)-saturated context A as 
follows. For any node {p, x) E Q, we construct a Q-valid context of type v having a node 
{p',x') on its backbone satisfying {p',x') {p,x). It will then suffice to define A as the 

concatenation of all these contexts. Since v is idempotent A will have type v as well. 

Let {p, x) with p in Q and x a port-node of p, by definition, there exist some i, some 

V E Vj and some {i,Bs,fH,fv) € V such that {i,Ms,fH,fv) = Piq,y) with {p,x) {q,y)- 

As X is a port-node, so is y and we have fv{HQ) 0 and therefore by V-saturation of v, 
we get {f f'jj, fy) E V such that /^(LIq) contains Vj for some j. By definition, we get 
{p',x') {q,y) [Pix] such that /3(p',x') = {i' ,Wg, f'jj, ff). Hence we can create a 

Q-valid context of type Vj, with a unique position {p', x') on its backbone. Since vi, ... ,Vn 
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are all Q-valid, this context can then be completed into a Q-valid context of type v which 
terminates the proof. D 

We say that a is closed under profile saturation iff for all X H, for all X-compatible 
V € C, for all p € y that are V-saturated and all hi, 1x2 € H\;: 

v‘^hi = V^h2 

Observe that all quantifications in the definition range over finite sets. Therefore, if one can 
compute the X-compatible conhgurations for all X, one can decide closure under prohle 
saturation by testing all possible combinations. In the next proposition, we prove that this 
is equivalent to testing closure under saturation. 

Proposition 9.7. Let a : {H,V) be a morphism into a finite forest algebra. Then 

the following three properties are equivalent: 

(1) a is closed under saturation. 

(2) a is closed under i-saturation 

(3) a is closed under profile saturation. 

Proof. We prove that 1) => 3) 2) => 1). That 2) 1) is immediate by dehnition of 

saturation. 

We now prove 1) ^3). Assume that a is closed under saturation. By Lemma [6Tl a 
is closed by ^-saturation for some k > i. We need to prove that a is closed under prohle 
saturation. Let X T H, V € € that is A-compatible V G C, u € 1^ that is V-saturated and 
hi,h 2 G Hi;. Using Fact 19.61 we get Q C A+ such that Hi; = Hq and v‘^ is the image of 
some (Q,/c)-saturated context. It is now immediate from ^-saturation that v‘^hi = v'^h 2 . 

It remains to prove that 3) 2). Assume that a is closed under prohle saturation. We 

need to prove that a is closed under ^-saturation. Let Q C A+, A that is (Q, ^)-saturated 
and hi,h2 G Hq. Using Fact l9.6[ we get V G Cl such that Hy = Hq and a(A‘^) is V-saturated. 
It is now immediate from prohle saturation that a{A)‘^hi = a(A)‘^/i 2 . D 

In view of Proposition 19.71 it is enough to show that closure under prohle saturation is 
decidable in order to prove Theorem 19.11 Because all the quantihcations inside the dehnition 
of prohle saturation range over hnite sets, it is enough to show that those hnite sets, namely 
U[a, X] for all X T H, can be computed. 

This is immediate in the case of ranked trees. Indeed for trees of rank I, the set of 
legal shallow multicontexts is a subset of A^. Therefore U[a, A] = 3i^i[a,X] can now be 
computed by considering all the hnitely many possible sets Q C A^. Hence Theorem 19.11 is 
proved for regular languages of ranked trees. 

In the general case it is not obvious how to compute U[a, X] and this is the goal of the 
remaining part of this section. 

9.2. Computing the Sets of A-indistinguishable Configurations. We present an 
algorithm which, given as input a : —>■ {H,V) and X C H, computes the set 3[a,X]. 

This is a hxpoint algorithm that starts from trivial conhgurations corresponding to sets of 
shallow multicontexts that are singletons composed of a single letter shallow multicontext 
and saturate the set with two operations. 

Our hrst operation is the semigroup operation on (recall Fact 19.4|) which corresponds 
to concatenating shallow multicontexts. Our second and most important operation is de¬ 
rived from a well-known property of FO^(<) on strings. Let C be a hnite string alphabet 
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and let u, u' € C~^ such that u, u' both contain all labels in C. Then for all fe G N and any 
u" G (7+: 

{ufu"{u'f =k {uf{u'f 

In our case however, the situation will be slightly more complicated as we work with the 
weaker equivalence in which tests on labels are relaxed. 

Remark 9.8. By definition for any V G X] all profiles contained in sets ofV have the 
same alphabet. Therefore, we will assume implicitly that this is true of all sets of profiles 
we consider from now and whenever we refer to “the alphabet ofV” we mean this common 
alphabet. 

Fixpoint Algorithm. Recall from Fact l9.4l that is equipped with a semigroup operation. 
We start with a few definitions about alphabets that we will need in order to present the 
algorithm. To each alphabet C Ag, we associate a configuration [[Bs]] as follows, 

[[BsH = {{I3{p,x) \ p has alphabet B^ and x has label c} | c G B^} G Gl 

Observe that for any B^, it is simple to compute [[B^]] from a. Indeed, for any c G B^, if y 
denotes the unique position in the shallow multicontext c, one can verify that, 

I Ld7label*' } = P +'^ Pfll'' € P I V has alphabet B,) 

An important remark is that while [[Bs]| is a configuration, in general, it is not an X-relevant 
configuration (for any X). The main idea behind the fixpoint algorithm is that [[B^]] can 
become A-relevant if one adds ’’appropriate” A-relevant configurations to its left and to 
its right. The definition of ’’appropriate” is based on the notion of X-approximation of an 
alphabet that we define now. 

In the X-relaxed game, there are three types of nodes, port-nodes, A-nodes and X- 
nodes. Let B^ C A^, and let c,c' G Bg we say that c,c' are Ms[X]- equivalent iff c = c' 
or there exists &(□) G B^ such that c,c' are port-nodes or A-nodes labels of inner label b. 
Finally, an X-approximation of B^ is an alphabet C B^ such for any c G B^, there exists 
c' G Cs that is Ms[X]- equivalent to c. 

We can now present the algorithm. Set T[a] C 3[a, A] for all A as the set of configu¬ 
rations associated to sets of shallow multicontexts of the form {c} where c is a single letter 
in Ag. More precisely, T[a] is the set of configurations: 

{{{/3(c, x)}} I c G As and x the unique position in c} 

We set Sat[X, a] as the smallest set 5 C Gl containing T[a] and such that: 

(1) For all V,V' G 5, V +V' G .S. 

(2) For all B^ C Ag, if V,V^ G S have (possibly different) alphabets that are both A- 
approximations of B^, then ojV -|- [[Bs]] -|- ojV G S. 

where oj = ti;(Gl). Clearly Sat[X, a] can be computed from a. It is connected to 3[a,X] via 
the proposition below. For Ui,U 2 G Gl, we write Ui T hl 2 iff 

(1) For every Vi G there exists X 2 such that Vi C V2. 

(2) For every V 2 G W 2 there exists Vi G ZYi such that Vi C V 2 . 

One can verify that C is a preorder. If TJ C G1 , the downset of QJ is set = {V | G 
23 such that V E ^}- 
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Fact 9.9. Within C, + is compatible with C (i.e. Vi C V 2 and IA\ C IA 2 implies Vi +^/i C 
V 2 +U 2 ). 

Proposition 9.10. Let i = 2|Asp(|C| + 1) and X H, then for any k > i: 

3[a, X] = 3k[a,X] = \. Sat[X, a] 

It follows from Proposition 19.101 that 3[a, X] can be computed for any X C H. By 
combining this with Proposition 19.71 we obtain the desired corollary: 

Corollary 9.11. Let a : ^ (77, P) he a morphism into a finite forest algebra. It is 

decidable whether a is closed under saturation. 

Observe that Proposition 19.101 also contains a bound for £ in Fact 19.31 This bound is 
of particular interest: as explained in Proposition 19.71 i is also a bound for saturation, if a. 
is closed under saturation, then it is closed under ^-saturation. 

It now remains to prove Proposition 19.101 We prove that for any k > 9[a,X] C 

Jk[o:,X] C fSat[X,a] C3[q;,X]. Observe that fJ[Q;, X] CJi^[a,X] is immediate bv Fact 19.31 
We give the two remaining inclusions their own subsections. 

9.3. Proof of Correctness. We prove that Sat[X,a] C J[a,X]. One can then verify that 
4,J[a,X] = 3[a,X] and therefore that f Sat[X,a] C 3[a,X]. Recall that 3[q:, X] is defined 
as 9fc[a, X]. This means that it suffices to prove that for all A: G N, Sat[X,a] C 

Ufc[Q;, X]. We hx such a A: € N for remainder of the proof. 

By definition, T[a] C 3k[oi.,X] for every A: G N. We prove that 9fc[a,X] is closed under 
the two operations in the dehnition of Sat. We begin with Operation ([T|). 

Operation ([T|). Let V, V' G Jfc[a,X] and let Q,Q' be the sets of shallow multicontexts 
witnessing the membership of V and V' in Set R = {q + q'\q^Q and q' G Q'}, 

we prove that R witnesses the membership of V + in Jfc[a,X]. This is a consequence of 
Fact 19.21 and the following lemma: 

Lemma 9.12. Let {pi,xi),{p 2 ,X 2 ) G Q such that {pi,xi) (^ 2 , 2 : 2 ) (Kj (P 2 ) ^ 2 ) ^ 
Q' such that {p'i,x'i) {p 2 ,X 2 ). Then if ri = pi + p'l and r 2 = P 2 +P 2 ? 

(ri,xi) (r 2 ,X 2 ) and (n,x'l) (r 2 ,X 2 ) 

Proof. This is a composition lemma whose proof is immediate using Ehrenfeucht-Frai’sse 
games. □ 

We have three conditions to check. That (1) holds is immediate from Lemma 19.121 
For (2), if (r, x) G R, we have r = p + p' with p,p' G Q,Q'. By symmetry, assume that 
X £ p. By definition of Q, we get (pi, xi),..., {pn, x„) G Q such that (p, x) (pi, xi) 

■ ■ ■ (p^, Xn) and V = {/3(pj, Xj) | z < n} G V. Set V' as the union of all sets in V' and set 

R' as the set of pairs (g, y) such that q = Pj + p" with j < n and p" G Q'. By Lemma [9.121 
for all {q,y) G R', (r, x) {q,y)- Moreover, one can verify using Fact 19.21 that 

P{R') = V +£ V' G V 

It remains to verify (3). Set U G V-|- By symmetry, assume that U = V +£ V' with 
V G V and V' the union of all sets of V'. By definition of Q,Q' there exists (pi,xi) 
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■ ■ ■ —k (Pni Xn) G Q such that V = {13{pi,Xi) \ i <n\. Using the same set R' of pairs (g, y) 
as above we get that all pairs in R' are =^-equivalent and 

P{R') = V +£ V' = U 

Operation ([2]). Let C and V, € Difc[a,X] having alphabets Cs,C^ that are X- 
approximations of B* and let V and V' be the unions of all sets in V and V' respectively. 
Let Q and Q' be the sets of shallow multicontexts witnessing the membership of V and V 
into 3k[oi,X]. Furthermore, set R as the set of all shallow multicontexts of alphabet B^. We 
prove that P = kojQ+R+kujQ' witnesses the fact that kuiV+WMs^+kojV' = wV+ffB^H+wV' 
belongs to X]. Most of the proof is based on the following property of the equivalence 

k * 

Lemma 9.13. Let qi,q 2 € kuQ, qi q 2 S kujQ' and ri,r 2 € R. Then for every nodes xi,X 2 
of ri,r 2 with the same label c G B^ 

(^i + n + q\', a:i) =k {42 + r 2 + 44, X 2 ) 

Proof. We give a winning strategy for Duplicator in the X-relaxed game. We simplify the 
argument by assuming that 4i = 42 = kojq for some q € Q and that 4i' = 44 = kujq' 
for some q' G Q'. Since all shallow multicontexts in Q (resp. Q') are =^-equivalent (see 
Item (1) in the definition of Difc[a,X]), one can then obtain a strategy for the general case 
by adapting this special case. 

Recall that q (resp. q') has alphabet C* (resp. C'^) and that and are X- 
approximations of B^. Therefore, using a standard game argument, one can verify that 
Duplicator can win k moves of the X-relaxed game between {kujq + ri + kujq',xi) and 
{kujq + r 2 + kujq', X 2 ) as long as no safety move is played. In case a safety move is played the 
X-approximation hypothesis guarantees that ri,r 2 of alphabet B^ contains the appropriate 
letter with a port-node label. □ 

We now prove that the set P satisfies the definition of 3fc[Q!, X] for kivV + [[Bs]] -|- kojV'. 
We have three conditions to verify. That (I) holds is immediate from Lemma I9.I3I 

For (2), consider {p,x) G P. By dehnition, p = q + r + 4 with r G R, q £ kuQ 
and q' G kojQ'. We treat the case when x G r (the other cases are treated with a similar 
argument). Let c be the label of x and let (ri, xi) ... (r„, Xn) be all nodes such that ri G R 
and Xi is a node of of label c. By dehnition of [[B^]] and R, we have 

U = {/3(ri, xi),..., /3{rn, Xn)} G psfl 

Let {pi,yi) ■ ■ ■ {pm-,Vm) be all nodes such that pi G P and y* is a node of label c in the 
“R-part” of Pi- From Lemma 19.131 we have: 

X) {Pl,yi) ■■■=k {Pm, Vm) 

Observe that viewed as nodes of the “R-part” of pi, the nodes yi are exactly the nodes Xj. 
Using Fact 19.21 one can then verify that 

{f^{Pii yi) I * — xn{ = V -\-j. ■ ■ ■ -\-j. V -|-,.U -|-£ V -|-£ • • • Pi G ku)V P + kuiV 

k(jj times k(jj times 

It remains to prove (3). Let U G uV P psH PwV' . Again, we concentrate on the case when 

U = V +, • • • v+,w V' +£ • • • Pi V' 

ku) times ku) times 
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with W G [[RJI (other cases are treated in a similar way). By definition of [[BsTI) ^6 have 

W = {^(n, xi),..., l3{rn,Xn)} G psH 

with (ri,xi) ... (rn,Xn) as all nodes such that ri £ R and Xi is a node of r* of label c for 
some fixed c. Let {pi,yi), ■ ■ ■, {pm, Um) G P be all the shallow multicontexts of P such that 
Pi is a node of pi with label c in the “P-part” of pi. By Lemma 19.131 we have 

(Pi,yi) =k ■■■=k iPm,ym) 

Observe that viewed as nodes of the “P-part” of pi, the nodes y* correspond exactly to the 
nodes Xj. Using Fact 19.21 one can then verify that 

{P{Pi,yi) I i < m} = U 


9.4. Proof of Completeness. Let i be defined as in Proposition 19.101 We prove that for 
any k > i, dfc[a,X] C Sat[X,a]. We will need the following definition. 

Let A: G N, X C P'. To every shallow multicontext q G Af, we associate a configuration 
gk[X]{q) G 3[a,X]. For any set \| (p,x) {p',x')}. We set 

Qk[X]{q) = {\,,y \y£q] 

The following two facts are immediate consequences of the definitions: 

Fact 9.14. For all k < k' £ N, X C H and q £ A+ we have Qki[X]{q) P Qk[X]{q). 

Fact 9.15. For allk £'H and X F H we have dfc[a, X] = \.{Qk[X]{q) \ q £ A+}. 

We can now finish the proof of Proposition 19.101 The proof is by induction on the size 
of the alphabet as stated in the proposition below. 

Proposition 9.16. Lei Eg C A^, k > 2|Bsp(|£| + 1) and p a shallow multieontext sueh that 
p eontains only labels in B^. Then Qk[X]{p) £ lSat[X,a]. 

Using Proposition 110.1^ with B^ = Ag, we obtain that for any k > i and any p £ A+, 
we have Qk[X]{p) £ \,Sat[X,a]. It then follows from Fact 110.1^ that dfc[a, X] C ],Sat[X,a] 
which terminates the proof of Proposition 19. lOl It now remains to prove Proposition 110.1^ 
The remainder of the section is devoted to this proof. 

For the sake of simplifying the presentation, we assume that p can be an empty shallow 
multicontext denoted ’e’ and that Sat[X,a] contains an artificial neutral element ’0’ such 
that Qk[X](£) = 0 for any k. As e will be the only shallow multicontext having that property 
this does not harm the generality of the proof. 

As explained above, the proof is by induction on the size of B^. The base case happens 
when Bs = 0. In that case, p = e and Qk[X]{£) £ Sat[X, a] by definition. Assume now that 
Bs 7 ^ 0, we set k > 2|Bsp(|Cl| -|- 1) and p as a shallow multicontext containing only labels in 
B^. We need to prove that Qk[X]{p) £ 4-5'at[X,a]. 

First observe that when p does not contain all labels in B^, the result is immediate by 
induction. Therefore, assume that p contains all labels in B^. We proceed as follows. First, 
we define a new notion called a (Bs[X],n)-pattern. Intuitively, a shallow multicontext q 
contains a (Bs[X], n)-pattern iff all labels in B^ (modulo Bs[X]-equivalence) are repeated at 
least n times in q. Then, we prove that if p contains a (Bs[X], n)-pattern for a large enough 
n, then ^fc[X](p) can be decomposed in such a way that it can be proved to be in Sat[X, a] 
by using induction on the factors, and Operations ([I]) and (l2|) to compose them. Otherwise, 
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we prove that Gk[X]{p) can be decomposed as a sum of bounded length whose elements 
can be proved to be in Sat[X, a] by induction. We then conclude using Operation ([1]). We 
begin with the dehnition of n)-patterns. 

(Bs[X], n)-patterns. Consider the Bs[X]-equivalence of labels in Bg and let m be the 
number of equivalence classes. We fix an arbitrary order on these classes that we denote 
by Co,..., Cm-i ^ B^. Recall that C* is a an ^-approximation of B^ iff Cg contains at 
least one element of each class. Let n G N. We say that a shallow multicontext q contains 
a (Bs[X], n)-pattern iff q can be decomposed as 

^ + Co + + Cl + ■ ■ • + + Cn + ^n +1 

such that for all i < n, Ci G Cj (with j = i mod m) and qi is a (possibly empty) shallow 
multicontext. In particular, the decomposition above is called the leftmost decomposition iff 
for alH < n no label in Cj (with j = i mod m) occurs in qi. Symmetrically, in the rightmost 
decomposition, for all i > 0, no label in Cj (with j = i mod m) occurs in qi+i. Observe that 
by definition the leftmost and rightmost decompositions are unique. In the proof, we use 
the following decomposition lemma. 

Lemma 9.17 (Decomposition Lemma). Let n G N. Let q be a shallow multicontext that 
contains a {Ms[X], n)-pattern and let q = qo cq+ Cn + (?n+i be the associated leftmost 
or rightmost decomposition. Then 

E Gk-nl^Mdo) + Gk-n[^]{co) -!-••• + Qk-n[^](Cn) + Qk-ni^jidn-kl) 

Proof. This is a simple Ehrenfeucht-Frai’sse game argument. Because of the missing bound¬ 
ary labels within the qj, using at most n moves. Spoiler can make sure that the game stays 
within the appropriate segment qj and can use the remaining k — n moves for describing 
that segment. □ 

This finishes the definition of patterns. Set n = m(|Ci| -|- I). We now consider two cases 
depending on whether our shallow multicontext p contains a (Bs[X], 2n)-pattern. 

Case 1: p does not contain a (B 5 [X], 2n)-pattern. In that case we conclude using 
induction and Operation ([T]). Let n' be the largest number such that p contains a (Bs[X], n')- 
pattern. By hypothesis re' < 2re. Let p = Po + co + ■ ■ • + c„'+Pn'+i be the associated leftmost 
decomposition. Observe that by definition, for i < re', pi uses a strictly smaller alphabet 
than B^. Moreover, since p does not contain a (Bs[X],re' -|- l)-pattern this is also the case 
for Pn'+i- Set k = k — n', hy choice of k, we have k > 2(|Bs| — l)^(|Ci| + !)• Therefore, we 
can use our induction hypothesis and for all i we get, 

Gj,[X]{pi)GiSat[X,a] 

Moreover, for all i, Qj^[X]{ci) G T[a] C Sat[X,a]. Finally, using Lemma flO. 141 we obtain 

Gk[X]{p) E G-^[X]{po) + G-^[X]{co) + • • • + Efc[X](c„0 + G-^[X]{p^,+^) 

From Operation ([T|) the right-hand sum is in 4- Sat[X, a\. We then conclude that Gk{X]{p) G 
4,5'at[X, a] which terminates this case. 

Case 2: p contains a (B 5 [X], 2re)-pattern. In that case we conclude using induction. 
Operation ([T]) and Operation ([2]). By hypothesis, we know that p contains a (Bs[X],re)- 
pattern, let p = po + cq + ■ ■ ■ + -|- Pn+i be the associated leftmost decomposition. Since 
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p contains a 2n)-pattern, Pn+i must contain a (Bs[X], n)-pattern. We set Pn+i = 

p' + c'q + ■ ■ ■ + c'^+ p'n+i as the associated rightmost decomposition. In the end we get 

P = PQ + Co Cn + p' + c'q + p'l + ' ' ' + (^n P'n+1 

Set k = k — 2n and observe that by choice of A:, A: > 2(|B5| — l)^(|ei| + 1). Therefore, as in 
the previous case, we get by induction that for all i, G^[X]{pi) G lSat[X,a], G^[X]{p[) G 
4,S'at[X,a], Q^[X]{ci) G 4-<S'aA[X, a] and Cy^[X](c') G X Sat[X,a]. Using the same inductive 
argument for p' may not be possible as p' might contain all labels in B^. 

lip' does not contain all labels in B^, then, by induction, Gj,[X]{p') G 4- Sat[X, a] and we 
can then use Lemma [10.141 as in Case 1 to conclude that Qk\X]{p) G X Sat[X,a]. Assume 
now that p' contains all labels in B^. Recall that m is the number of B^fX]-equivalence 
classes. For all j < |ei|, set 

m—l+jm 

E {Gkm{p^) + g-,[x]{ci)) v;.= E + 

i=jm i=jm 

Observe that for all j, by definition Vj, Vj have an alphabet which is an X-approximation 
of Bs and by Operation Vj,Vj G I Sat[X,a]. Moreover, it follows from a pigeon-hole 
principle argument that the sequences VoT - • • + V|ir| and Vq + - ■ •+^|£| must contain “loops”, 
i.e. there exists ji < and j'l < j '2 such that 

Vo + • • • + Vjrj = Vo + • • • + Vj2 

K + + = K + --- + ^k 

Set Wi = Vo + • • • + Vj ,, L/2 = V,i+i + • • • + Vj2 ,U[ = V.,+--- V', and AY' = V', + • • • + V,'^,. 
Observe that by Operation ([T]) , we have ZYi, ZY2, AY{, AY2 G 4 - Sat[X, a] and that by construction 
the alphabets of U 2 ,Ui X-approximations of B^. Moreover, a little algebra yields Ui = 
Ui+U2= U1+UU2 and U'2=U'^+U'2 = +W^. 

Set p" = Pj 2 m + ■ ■ ■ + Cn+ p' + c'q + ■ ■ ■ + p'ji^_^- Observe that by hypothesis on p', p" 
contains all labels in B^. It follows from Fact IIO.TTI and Lemma 110.141 that 

Gk [X] (p) QUi+U2 + Gk [^] {p”) +U'^+U'2 = Ui+ UJU 2 + Gj, [X] {p") + coU'^ + 
Moreover, since p" has alphabet B^, it is immediate that G^[X]{p") C ([B^]]. Therefore, 

Gk [X] (p) \ZUi+ L 0 U 2 + Psl + + U '2 

It is now immediate from Operation ([2]) UHA 2 + PsH + wAYj G 4'5'ai[X, a]. By combining 
this with Operation ([1]), we obtain 

Gk[X] (p) QUi+ ujU 2 + + ZY^ G i Sat[X, a] 

We conclude that Gk[X]{p) G 4-<S'aA[X, a] which terminates the proof. 

10. Other logics 

It turns out that the proof of Theorem 16.21 depends on the horizontal modalities of the logic 
only via the notion of definability within shallow multicontexts. It can therefore be adapted 
to many other horizontal modalities assuming those can at least express the fact that two 
nodes are siblings (ie. can talk about the shallow multicontext of a given node). By tuning 
this notion one can obtain several new characterizations. We illustrate this feature in this 
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section with the horizontal predicates Xjj, S and S^, adopting the point of view of 

temporal logic. 

The semantic of these predicates is defined as follows. The formula holds at a 
node X p holds at some sibling of x distinct from x. It is a shorthand for Fjjt/? V 
The formula holds at x if holds at some sibling of x including x. It is a shorthand 
for if V The predicates Xjj and Xj^^ are the usual next sibling and previous sibling 

modalities. 

The vertical navigational modalities remain the same and the corresponding logics are 
denoted by EF + F~^(S), EF + F“^(S^), EF + F~^(Xij, Fjj, Xh"^, Fh"^) and a characteri¬ 
zation can be obtained for each of them using the same scheme as for EF -|- F-i(Fh,Fh-'). 

As before, EF-|-F~^(S^) and EF -|- F~^(Xii, F^, Xh"^, Fj^^) are equivalent to two- 
variable fragments of first-order logic. EF -|-F~^(S^) has the same expressive power as 
FO^(s,<v) while EF -|- F~^(Xij, Fji, Xh"^, Fh"^) corresponds to FO^(5ucch, <h) <v)- Here 
s{x^ y) is a binary predicate that holds when x and y are siblings and Succ]^ is a binary pred¬ 
icate that holds when y is the next sibling of x. These facts can be proved along the same 
way as the equivalence between FO^(<v, <h) and EF -|- F''(Fh,Fh- ^), see Theorem EH 

Note that this is no longer the case for EF -|-F~^(S) as languages defined in this 
formalism are closed under bisimulation while in the two variable fragment of first-order 
logic it is possible to have quantifications over incomparable nodes by using the equality 
and negation which rules out closure under bisimulation. 

The proof techniques presented in the previous sections require at least the power of 
testing whether two nodes are sibling in order to extract a shallow multicontext within a 
forest. Hence it cannot be applied to FO^(<v) and finding a decidable characterization for 
this logic remains an open question. Similarly, we rely on the fact that the child relation 
cannot be expressed and finding a decidable characterization in the presence of this predicate 
remains also an open question. 

As we don’t have a vertical successor modality, the characterizations we obtain for 
EF + F-i(S), EF + F-^S^) and EF + F'i(Xh, Fh, Xh \ Fh still require Identity B 
on the vertical monoid V of the forest algebra. Identity (j6.2p is now replaced by the 
appropriate identity corresponding to the new horizontal expressive power. Finally the 
notion of saturation is adapted by replacing with a notion reflecting the horizontal 
expressive power of the logic. It is defined as in Section El by only modifying the allowed 
moves in the game in order to reflect the horizontal expressive power of the associated logic 
(the constraints on the labels remaining untouched within each game). In a similar way, 
=k is replaced in the proof by the suitable game. Besides these changes at the level of 
definitions, the characterization is stated and proved as for Theorem 16.21 

10.1. EF -|- F~^(S). In this case the games on shallow multicontexts are defined with no 
navigational constraints on Duplicators moves: Duplicator can respond by choosing an 
arbitrary node, the restriction being only on its label. 

Note that the games no longer depend on k, as only the presence or absence of a 
given symbol of Ag inside the shallow multicontext matters. We write S-= and S-=^ the 
equivalence relations resulting from this game and its A-relaxed variant. 

The following analog of Claim EH is an immediate consequences of the definitions. 
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Claim 10.1. Let X C H and {p, x) be a node. There is a EF + F ^(S) formula ipp^x having 
one free variable and such that for any forest s, holds exactly at all nodes {p', x') such 
that {p,x) S-=^ {p',x'). 


Recall the definition of saturation given in Section 16.21 The notion of S-saturation is 
obtained identically after replacing with S-=^. With these new definitions we get: 


Theorem 10.2. A regular forest language L is definable in EF + F~^(S) iff its syntactic 
morphism a : —>■ {H, V) satisfies: 

a) H satisfies the identities 

2h = h and f + g = g + f (lOT) 


b) V satisfies Identity (j6.3p 

{uvYv{uvY = {uv)^ 

c) the leaf completion of a is closed under S-saturation. 


Note that (jlO.ip simply states that the logic is closed under bisimulation, hence reflecting 
exactly the horizontal expressive power of EF + F~^(S). 

Concerning the proof of Theorem 110.21 aside from the initial choice of the integer k 
which is no longer necessary here, it is identical to the one we gave for Theorem 16.21 after 
replacing Lemma l8.II bv the following result: 


Lemma 10.3. Let L be a language whose syntactic forest algebra satisfies the identities 
stated in Theorem \ 10. Si For all shallow multicontexts p S-= p' and for all forests s, p[s] 
and p'[s] have the same forest type. 

Proof. Since p S-= p' the forests p[s] and contain the same symbols but possibly with 
a different number of occurrences. It follows from (llO.ip that a(p[s]) = a(p'[s]). □ 


10.2. EF + F~i(S^) and F02(s, <v). As before the key point is the allowed moves in the 
Ehrenfeucht-Fraisse games. In this case we only require that Duplicator moves in a different 
position as soon as Spoiler does. 

As in the previous section, the games no longer depend on k as it only matters whether 
or not a label occurs and whether or not it occurs twice. We write S^-= and S^-=^ the 
equivalence relations resulting from this game and its X-relaxed variant. 

The following analog of Claim 1531 is an immediate consequences of the definitions. 

Claim 10.4. Let X C H and {p,x) be a node. There is a EF + F ^(S^) formula ipp^x 
having one free variable and such that for any forest s, il^p^x holds exactly at all nodes {p',x') 
such that {p,x) S^-=^ {p',x'). 

As in the previous case, replacing with -=^ in the definition of saturation yields 
a new notion of saturation that we call -saturation. We can show: 


Theorem 10.5. A regular forest language L is definable in EF + F ^(S^) iff its syntactic 
morphism a : ^ {H, V) satisfies: 

a) H satisfies the identities 

“ih = 2h and f + g = g + f 


b) V satisfies Identity (j6.3p 

{uvYv{uvT = {uvY 

c) the leaf completion of a is closed under -saturation. 


( 10 . 2 ) 
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Notice that (I10.2h reflects exactly the horizontal expressive power of EF + F~^(S^): no 
horizontal order and counting up to threshold 2. 

Concerning the proof of Theorem 110.51 aside from the initial choice of the integer k 
which is no longer necessary here, it is identical to the one we gave for Theorem 16.21 after 
replacing Lemma l8.II bv the following result: 

Lemma 10.6. Let L be a language whose syntactic forest algebra satisfies the identities 
stated in Theorem \10.5[ For all shallow multicontexts p S^-= p' and all forests s, p[s] and 
p'[s] have the same forest type. 

Proof. Since p S^-= p' the forests p[s] and p'[s] contain the same symbols with the same 
number of occurrences up to threshold 2. It follows from (jl0.2l) that Q;(p[s]) = a{p'[s]). D 

10.3. EF + F^^(Xjj, Fjj, XjL^, Fi7^) and FO^(5uccij, <h) <v)- In this case Duplicator not 
only must respect the direction in which Spoiler has moved his pebble, but she also must 
place her pebble on the successor (predecessor) of the current position if this was also the 
situation for Spoiler. 

The games now depends on k and we write Sue- =k and Sue- the equivalence 
relations resulting from this game and its X-relaxed variant. 

The following analog of Claim [531 is an immediate consequences of the definitions. 

Claim 10.7. Let X C H and {p,x) be a node. There is a EF -|-F^^(Xii, F^, XjL^, FjL^) 
formula having one free variable and such that for any forest s, holds exactly at 
all nodes {p',x') such that {p,x) Suc-=^ {p',x'). 

As in the previous cases, we obtain from Suc-=^ a new notion of saturation that we 
call Xjj-saturation. We can show: 

Theorem 10.8. A regular forest language L is definable in FO^(5uccii, <h) <v) iff its syn¬ 
tactic morphism a : {H, V) satisfies: 

a) H satisfies for all h,g ^ H, for all e & H such that 2e = e: 

oj{e -\-h-\-e + g-\-e)-\-g->r oj{e -\-h + e + g + e) = w(e -\-h-\-e + g + e) (10.3) 

b) V satisfies Identity (16.3p 

{uvffv{uvff = {uvff 

c) the leaf completion of a is closed under Xii-saturation. 

Equation (|10.3I) is extracted from the following result which is essentially proved in |TW98] 
based on a result of [Alm96] (see Footnote on page [9]) . 

Theorem 10.9 ( [T W 98| . [Alm96| ). A regular string language L is definable in FO^(S'ucc, <) 
iff its syntactic semigroup S satisfies for all u,v ^ S, for all e ^ S such that e^ = e: 

{eueveff v^eueveff = {eueveff 

Again, the proof of Theorem 110.81 follows the lines of the proof of Theorem 16.21 after 
replacing Lemma l8.II bv the following simple result: 

Lemma 10.10. Let L be a language whose syntactic forest algebra satisfies the identities 
stated in Theorem \10.8l There exists a number k' such that for all k > k', all shallow 
multicontexts p Suc-=fc p' and all forests s, p[s] and p'\fi\ have the same forest type. 
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Proof. This is a consequence of the fact that H satisfies Identity (IIP., ip . The proof is 
identical to the one we provided for Lemma 18.11 replacing Theorem 14.21 by Theorem 110.91 
and =k with Suc-=fc. Q 

10.4. Decidability. Deciding whether a regular forest language is definable in EF + F“^(S) 
and EF + F“^(S^) is simple from Theorem 110.21 and Theorem 110.51 As in Section [9] we 
prove that the corresponding notions of saturation are equivalent to their abstract variant. 
The latter are decidable because they don’t depend on k and, up to equivalence, only finitely 
many Q C A+ needs to be considered. 

However, for FO^(Succ, <), it is not clear how to generalize the construction of the indis¬ 
tinguishable sets. We leave this and the status of deciding definability in FO^pS'ucch, <h) <v) 
as an open problem. 

Let i be defined as in Proposition 19.101 We prove that for any k > I, Jk[c(,X] C 
Sat[X,a]. We will need the following definition. 

Let A: € N, A C FT. To every shallow multicontext q G A+, we associate a configuration 
Qk[X]{q) € 3[a,X]. For anyp,x set = {/3{p',x') \ {p,x) {p',x')}. We set 

GkiXjiq) = {V,,p \yeq} 

The following two facts are immediate consequences of the dehnitions: 

Fact 10.11. For all k < k' ^ N, X F H and q G A+ we have Qk'[^]{Q) E 0k[^]{Q)- 
Fact 10.12. For a// A; G N and X F H we have 3k[c(,X] = \.{Qk[X]{q) \ q G A+}. 

We can now hnish the proof of Proposition 19.101 The proof is by induction on the size 
of the alphabet as stated in the proposition below. 

Proposition 10.13. Let C Ag, k > 2|Bsp(|Cl| -|- 1) and p a shallow multicontext such 
that p contains only labels in B,,. Then 0k[^](p) G 4'5'aA[A, a]. 

Using Proposition 110.1^ with B^ = Ag, we obtain that for any k > i and any p G A+, 
we have Qk[X]{p) G 4-5'at[A, a]. It then follows from Fact 110.1^ that Difc[a, A] C 4,S'aA[A, a] 
which terminates the proof of Proposition 19. lOl It now remains to prove Proposition 110.1^ 
The remainder of the section is devoted to this proof. 

For the sake of simplifying the presentation, we assume that p can be an empty shallow 
multicontext denoted ’e’ and that Sat[X,a\ contains an artificial neutral element ’0’ such 
that Qk[X]{£) = 0 for any k. As £ will be the only shallow multicontext having that property 
this does not harm the generality of the proof. 

As explained above, the proof is by induction on the size of B^. The base case happens 
when Bs = 0. In that case, p = £ and Qk[X](£) G Sat[X, a] by dehnition. Assume now that 
Bs 7 ^ 0, we set k > 2|Bsp(|ei| -|- 1) and p as a shallow multicontext containing only labels in 
Bs. We need to prove that Qk[X]{p) G 4-5'at[A,a]. 

First observe that when p does not contain all labels in B^, the result is immediate by 
induction. Therefore, assume that p contains all labels in B^. We proceed as follows. First, 
we dehne a new notion called a (Bs[A],n)-pattern. Intuitively, a shallow multicontext q 
contains a (Bs[A], n)-pattern iff all labels in B^ (modulo B^[A]-equivalence) are repeated at 
least n times in q. Then, we prove that if p contains a (Bs[A], n)-pattern for a large enough 
n, then Qk{X]{p) can be decomposed in such a way that it can be proved to be in Sat[X, a] 
by using induction on the factors, and Operations ([I]) and ([2]) to compose them. Otherwise, 
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we prove that Gk[X]{p) can be decomposed as a sum of bounded length whose elements 
can be proved to be in Sat[X, a] by induction. We then conclude using Operation ([1]). We 
begin with the dehnition of n)-patterns. 

(Bs[X],n)-patterns. Consider the Bs[X]-equivalence of labels in Bg and let m be the 
number of equivalence classes. We fix an arbitrary order on these classes that we denote 
by Co,..., Cm-i ^ B^. Recall that C* is a an ^-approximation of B^ iff Cg contains at 
least one element of each class. Let n G N. We say that a shallow multicontext q contains 
a (Bs[X], n)-pattern iff q can be decomposed as 

^ + Co + + Cl + ■ ■ • + + Cn + ^n+1 

such that for all i < n, Ci G Cj (with j = i mod m) and qi is a (possibly empty) shallow 
multicontext. In particular, the decomposition above is called the leftmost decomposition iff 
for alH < n no label in Cj (with j = i mod m) occurs in qi. Symmetrically, in the rightmost 
decomposition, for all i > 0, no label in Cj (with j = i mod m) occurs in qi+i. Observe that 
by definition the leftmost and rightmost decompositions are unique. In the proof, we use 
the following decomposition lemma. 

Lemma 10.14 (Decomposition Lemma). Let n G N. Let q be a shallow multicontext that 
contains a (Es[X],n)-pattern and let q = q^ cqC n-\- qn+i be the associated leftmost 
or rightmost decomposition. Then 

Gk[X\{(l) E Gk-n[^\{Q.o) + Efc-n[^](co) + ' ' ' + Qk-n[^](Cn) + Qk-ni^jidn-kl) 

Proof. This is a simple Ehrenfeucht-Frai’sse game argument. Because of the missing bound¬ 
ary labels within the qj, using at most n moves. Spoiler can make sure that the game stays 
within the appropriate segment qj and can use the remaining k — n moves for describing 
that segment. □ 

This finishes the definition of patterns. Set n = m(|Ci| -|- 1). We now consider two cases 
depending on whether our shallow multicontext p contains a (Bs[X], 2n)-pattern. 

Case 1: p does not contain a (B 5 [X], 2n)-pattern. In that case we conclude using 
induction and Operation ([T]). Let n' be the largest number such that p contains a (Bs[X], n')- 
pattern. By hypothesis re' < 2re. Let p = Po + co + ■ ■ • + c„'+Pn'+i be the associated leftmost 
decomposition. Observe that by definition, for i < re', pi uses a strictly smaller alphabet 
than B^. Moreover, since p does not contain a (Bs[X],re' -|- l)-pattern this is also the case 
for Pn'-ki- Set k = k — n', hy choice of k, we have k > 2(|Bs| — l)^(|Ci| + !)• Therefore, we 
can use our induction hypothesis and for all i we get, 

Gj,[X]{pi)GiSat[X,a] 

Moreover, for all i, Qj^[X]{ci) G T[a] C Sat[X,a]. Finally, using Lemma flO. 141 we obtain 

Gk[X]{p) E G-^[X]{po) + G-^[X]{co) + • • • + Gk[X]{cn:) + Gk[X]{p^,+{) 

From Operation ([T|) the right-hand sum is in 4- Sat[X, a\. We then conclude that Gk{X]{p) G 
4,5'at[X, a] which terminates this case. 

Case 2: p contains a (B 5 [X], 2re)-pattern. In that case we conclude using induction. 
Operation ([T]) and Operation ([2]). By hypothesis, we know that p contains a (Bs[X],re)- 
pattern, let p = po + cq + ■ ■ ■ + -|- Pn+i be the associated leftmost decomposition. Since 
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p contains a 2n)-pattern, Pn+i must contain a (Bs[X], n)-pattern. We set Pn+i = 

p' + c'q + ■ ■ ■ + c'^+ p'n+i as the associated rightmost decomposition. In the end we get 

P = PQ + Co Cn + p' + c'q + p'l + ' ' ' + (^n P'n+1 

Set k = k — 2n and observe that by choice of A:, A: > 2(|B5| — l)^(|ei| + 1). Therefore, as in 
the previous case, we get by induction that for all i, G^[X]{pi) G lSat[X,a], G^[X]{p[) G 
4,S'at[X,a], Q^[X]{ci) G 4-<S'aA[X, a] and Cy^[X](c') G X Sat[X,a]. Using the same inductive 
argument for p' may not be possible as p' might contain all labels in B^. 

lip' does not contain all labels in B^, then, by induction, Gj,[X]{p') G 4- Sat[X, a] and we 
can then use Lemma [10.141 as in Case 1 to conclude that Qk\X]{p) G X Sat[X,a]. Assume 
now that p' contains all labels in B^. Recall that m is the number of B^fX]-equivalence 
classes. For all j < |ei|, set 

m—l+jm 

E {Gkm{p^) + g-,[x]{ci)) v;.= E + 

i=jm i=jm 

Observe that for all j, by definition Vj, Vj have an alphabet which is an X-approximation 
of Bs and by Operation Vj,Vj G I Sat[X,a]. Moreover, it follows from a pigeon-hole 
principle argument that the sequences VoT - • • + V|ir| and Vq + - ■ •+^|£| must contain “loops”, 
i.e. there exists ji < and j'l < j '2 such that 

Vo + • • • + Vjrj = Vo + • • • + Vj2 

K + + = K + --- + ^k 

Set Wi = Vo + • • • + Vj ,, L/2 = V,i+i + • • • + Vj2 ,U[ = V.,+--- V', and AY' = V', + • • • + V,'^,. 
Observe that by Operation ([T]) , we have ZYi, ZY2, AY{, AY2 G 4 - Sat[X, a] and that by construction 
the alphabets of U 2 ,Ui X-approximations of B^. Moreover, a little algebra yields Ui = 
Ui+U2= U1+UU2 and U'2=U'^+U'2 = +W^. 

Set p" = Pj 2 m + ■ ■ ■ + Cn+ p' + c'q + ■ ■ ■ + p'ji^_^- Observe that by hypothesis on p', p" 
contains all labels in B^. It follows from Fact IIO.TTI and Lemma 110.141 that 

Gk [X] (p) QUi+U2 + Gk [^] {p”) +U'^+U'2 = Ui+ UJU 2 + Gj, [X] {p") + coU'^ + 
Moreover, since p" has alphabet B^, it is immediate that G^[X]{p") C ([B^]]. Therefore, 

Gk[X](p) \GUi+ L 0 U 2 + psl + + U '2 

It is now immediate from Operation ([2]) UHA 2 + PsH + wAYj G 4'5'ai[X, a]. By combining 
this with Operation ([1]), we obtain 

Gk[X] (p) QUi+ ujU 2 + +uU[+U' 2 €i Sat[X, a] 

We conclude that Gk[X]{p) G 4-<S'aA[X, a] which terminates the proof. 

11. Discussion 

We have obtained a characterization for FO^(<v, <h)) using identities on the syntactic forest 
algebra and the new notion of saturation. Our proof technique applies to many other logical 
formalisms assuming these only differ from FO^(<v, <h) by their horizontal expressive power 
and that they can at least express the fact that two nodes are siblings. 

We have shown all these characterizations to be decidable except for FO^(S'uccii, <h) <v)- 
We leave this case as an open problem. As explained in Section [TOl it would be enough to 
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generalize our algorithm for computing profiles (i.e. Proposition IQ.lOji to the appropriate 
notion of profile for FO^(5ricci,, <h! <v)- 

Since FO^(<v) is unable to express the sibling relation, it cannot be covered by our 
techniques and we leave open the problem of finding a decidable characterization for this 
logic. 

It would also be interesting to incorporate the vertical successor in our proofs to obtain 
a decidable characterization for FO^(Smccij, <h) 5'mcCv, <v)- This would yield a decidable 
characterization of the navigational core of XPath. We believe this requires new ideas. 

It terms of complexity, a rough analysis of the proof of Theorem 19. II yields a 4-Exptime 
upper bound on the complexity of the problem. It is likely that this can be improved. Recall 
that the complexity of the same problem for the corresponding logics over words, which 
amounts to checking (jh.dp . is polynomial is the size of the syntactic monoid. 

It would also be interesting to obtain an equivalent characterization of FO ^(<V) <h) 
which remains decidable while avoiding the cumbersome notion of saturation. For instance 
it is not clear whether the notion of confusion introduced in [BSW12j can be used as a 
replacement. We leave this as an open problem. 

Acknowledgment. We thanks the reviewers for their comments on earlier versions of this 
article. Their comments led to significant improvements of the paper. 
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